Date of Award
Open Access Thesis
English Language and Literatures
This study endeavors to apply computational methods to a large dataset of popular fictional material, to see what topics emerge when viewed across genre lines and from a new, “machine” perspective. The dataset consists of 1,136 popular and commercially successful novels published between 2005 and 2016, including New York Times bestsellers and “genre fiction,” including science fiction, young adult, romance and mystery novels. Methods are discussed, including dataset preparation, LDA topic modeling and topic number optimization, qualitative topic interpretation, data analysis and visualization. The experiment was conducted in two parts, with the "document" or unit of analysis as each full novel, and then all of the sentences of every novel (over 9 million). 23 topics at the novel level and 66 at the sentence level were qualitatively interpreted, compared across genres and visualized. This study argues that computational tools can be generatively used to vastly broaden the scope of literary analysis, but results must still be interpreted through qualitative means. The novel may be quantitatively analyzed at both the level of the entire novel and the level of the sentence but analyzing at the level of the sentence offers more granular and interesting results. Topic modeling here identifies latent, ubiquitous topics that a human researcher may ignore or miss, re-centers research focus on the human body, its functions and the embodied nature of fiction, and was able to identify novel conventions such as linearity, characterization and settings and to distill many socially relevant topics including violence, surveillance and human institutions and activities. While topic modeling here reinforced some topical expectations based on genre conventions and tropes, topics also appeared unexpectedly in other genres: helping re-imagine the popular fiction landscape outside of genre-based siloes. Statistical analysis of a fictional dataset offers a new, birds-eye view of the contemporary popular fictional landscape, but also has many limitations, many of which are discussed.
Lundy, M.(2020). Text Mining Contemporary Popular Fiction: Natural Language Processing-Derived Themes Across Over 1,000 New York Times Bestsellers and Genre Fiction Novels. (Master's thesis). Retrieved from https://scholarcommons.sc.edu/etd/5759