Updated Corpus: Internet Argument Corpus V2
If you use this data in your research, please refer to and cite: Marilyn A. Walker, Pranav Anand, Jean E. Fox Tree, Rob Abbott, Joseph King. "A Corpus for Research on Deliberation and Debate." In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC), Istanbul, Turkey, 2012.
Overview: The Internet Argument Corpus (IAC) is a corpus for research in political debate on internet forums. It consists of ~11,000 disscussions, ~390,000 posts, and some ~73,000,000 words. Subsets of the data have been annotated for topic, stance, agreement, sarcasm, and nastiness among others.
The Data: The data is stored in JSON files with most annotations in CSV format (see included readme for details). Python code to load and use the data is included. The zip archive is 158MB.
Works that use this corpus:
- Rob Abbott, Marilyn Walker, Pranav Anand, Jean E. Fox Tree, Robeson Bowmani, and Joseph King. "How can you say such things?!?: Recognizing Disagreement in Informal Political Argument". In Proceedings of the Workshop on Language in Social Media (LSM), Portland, Oregon, USA, 2011.
- Marilyn A. Walker, Pranav Anand, Jean E. Fox Tree, Rob Abbott, Joseph King. "A Corpus for Research on Deliberation and Debate." In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC), Istanbul, Turkey, 2012.
Download: Fill out the following form to download version 1 of the Internet Argument Corpus.
GitHub: https://github.com/sl-m-lab/Internet-Argument-Corpus/
Contact: Please direct questions to Rob Abbott: abbott [at] soe [dot] ucsc [dot] edu
Website last updated June 21, 2024.