Internet Argument Corpus

Updated Corpus: Internet Argument Corpus V2

If you use this data in your research, please refer to and cite: Marilyn A. Walker, Pranav Anand, Jean E. Fox Tree, Rob Abbott, Joseph King. "A Corpus for Research on Deliberation and Debate." In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC), Istanbul, Turkey, 2012.

Overview: The Internet Argument Corpus (IAC) is a corpus for research in political debate on internet forums. It consists of ~11,000 disscussions, ~390,000 posts, and some ~73,000,000 words. Subsets of the data have been annotated for topic, stance, agreement, sarcasm, and nastiness among others.

The Data: The data is stored in JSON files with most annotations in CSV format (see included readme for details). Python code to load and use the data is included. The zip archive is 158MB.

Works that use this corpus:

Contact: Please direct questions to Rob Abbott: abbott [at] soe [dot] ucsc [dot] edu

