Film Corpus 1.0

Updated Corpus: Film Corpus 2.0


If you use this data in your research, please refer to and cite:

Overview: The film corpus consists of 862 film scripts from The Internet Movie Script Database (IMSDb) website (http://www.imsdb.com/), representing 7,400 characters, with a total of 664,000 lines of dialogue and 9,599,000 tokens. Our snapshot of IMSDb is from May 19, 2010. 

Download: Fill out the following form to download the Film Corpus 1.0.

GitHub: https://github.com/zhichaohu/film-corpus-1

Website last updated June 21, 2024.

User Information