Updated Corpus: Film Corpus 2.0
If you use this data in your research, please refer to and cite:
Overview: The film corpus consists of 862 film scripts from The Internet Movie Script Database (IMSDb) website (http://www.imsdb.com/), representing 7,400 characters, with a total of 664,000 lines of dialogue and 9,599,000 tokens. Our snapshot of IMSDb is from May 19, 2010.
Download: Fill out the following form to download the Film Corpus 1.0.
GitHub: https://github.com/zhichaohu/film-corpus-1
Website last updated June 21, 2024.