If you use this data in your research, please refer to and cite: Amita Misra, Pranav Anand, Jean E. Fox Tree, Marilyn Walker. "Using Summarization to Discover Argument Facets in Online Idealogical Dialog", In The North American Chapter of the Association for Computational Linguistics (NAACL), Denver, Colorado, 2015.
Overview: This is a corpus of human-written summaries of online debate dialogues used to discover argument facets. The idea is that the summaries can be used to discover what is 'naturally salient' to someone who is reading or perusing dialogues about these topics. The pyramid annotation scheme is applied to the summaries, and the "top tiers" of the pyramid are assumed to be the important aspects of the argumentative dialogue. Then we create a pyramid of pyramids, across many dialogues on a topic, in order to discover the aspects of the arguments that are repeatedly brought up on an issue.
Data: The corpus consists of 225 summaries, 5 different summaries produced by trained summarizers, of 45 dialogue excerpts on topics of social and political importance such as gun control, gay marriage, the death penalty and abortion. The dialogues are from one of our other corpora, the IAC, available from the same menu as this corpus.
Abstract: More and more of the information available on the web is dialogic, and a significant portion of it takes place in online forum conversations about current social and political topics. We aim to develop tools to summarize what these conversations are about. What are the CENTRAL PROPOSITIONS associated with different stances on an issue; what are the abstract objects under discussion that are central to a speaker’s argument? How can we recognize that two CENTRAL PROPOSITIONS realize the same FACET of the argument? We hypothesize that the CENTRAL PROPOSITIONS are exactly those arguments that people find most salient, and use human summarization as a probe for discovering them. We describe our corpus of human summaries of opinionated dialogs, then show how we can identify similar repeated arguments, and group them into FACETS across many discussions of a topic. We define a new task, ARGUMENT FACET SIMILARITY (AFS), and show that we can predict AFS with a .54 correlation score, versus an ngram system baseline of .39 and a semantic textual similarity system baseline of .45.
Download: Fill out the following form to download the Argumentative Dialog Summary Corpus.
Contact: Please direct questions to Amita Misra: amisra2 [at] ucsc [dot] edu