Source Blending in NLG

If you use this data in your research, please refer to and cite:  Learning from Mistakes: Combining Ontologies via Self-Training for Dialogue Generation. L. Reed, V.Harrison, S. Oraby, D. Hakkani-Tür, and M. Walker. SIGDIAL 2020. Boise, Idaho.


Overview: The Source Blending NLG corpus is a set of 77k meaning representation to natural language utterance pairs in the restaurant domain from two source ontologies, NYC and E2E (train) and 3040 MRs with attributes from both sources (test).  

Data: The data available for download is a zip of 2 CSVs files (one for train, one for test) containing meaning representations and their corresponding natural language utterances (utterances only for train)

Download: Fill out the following form to download the source blending for NLG corpus.

Download Source Blending NLG Corpus