YelpNLG: Review Corpus for NLG

Warning message

Submissions for this form are closed.

*The YelpNLG corpus and paper will be available soon for download. For any questions, please contact the authors.*

If you use these resources in your research, please refer to and cite:  Curate and Generate: A Corpus and Method for Joint Control of Semantics and Style in Neural NLG. S. Oraby, V. Harrison, A. Ebrahimi, and M. Walker. To appear at ACL 2019. Florence, Italy.

Overview: YelpNLG provides resources for natural language generation of restaurant reviews. The corpus consists of ~300,000 MR-to-NL (meaning representation to natural language reference) created using freely available restaurant reviews from Yelp.  MRs include semantic information as well as a novel characterization of descriptive lexical choice, sentiment, length, personal pronouns, and exclamations. The lexicons include a set of values instantiating popular attributes from the restaurant domain, e.g. cuisine or food.

Corpus: The corpus available for download is an archive of train, dev, and test splits, each with 4 files for each version of MR (base, +adj, +sent, and +style), and corresponding NL references. The README includes data formatting details.

Lexicons: The lexicons available for download are an archive of 6 text files, one for each attribute of interest (ambiance, cuisine, food, price, restaurant, service, and staff). Each file contains a set of values for the given attribute, one per line. Some values define a particular instance of the attribute, while others define a way to lexicalize the attribute. Details for how the lexicons were created can be found in the paper, and an example SPARQL query for retrieving foods is given in the README.

Download: Fill out the following form to download the YelpNLG corpus and lexicons.