Sentence Planning Corpus for NLG

If you use this data in your research, please refer to and cite:  Can Neural Generators for Dialogue Learn Sentence Planning and Discourse Structuring?.  L. Reed, S. Oraby, and M. Walker. INLG 2018. Tilburg, The Netherlands.


Overview: This dataset provides training data for natural language generations for sentence planning operations of various kinds.

SENTENCE PLANNING:  a set of ~205K meaning representation to natural language utterance pairs and their corresponding natural language utterances demonstrating discourse relations such as contrast and justification, as well as the use of aggregation operators in NLG.

Data: The data available for download 3 CSV files containing meaning representations and their corresponding natural language utterances, generated using the Personage statistical generator (Mairesse and Walker, 2010):
  - nyc_train.csv: the training data used for the contrast experiment, with 76,823 utterances
  - distributive_train.csv: the training data for the distributive experiment with 63,690 utterances
  - sentence_scoping_train.csv: the training data for the sentence scoping experiment with 64,442 utterances

For more details about these experiments please refer to the paper referenced above. 

Download: Fill out the following form to download the sentence planning NLG corpus.

Download Sentence Planning NLG Corpus