Ph.D. Thesis Defense, Charuta Pethe: 'Natural Language Processing for the Large-Scale Analysis of Literary Works'

Dates

Thursday, May 05, 2022 - 11:00am to Thursday, May 05, 2022 - 01:00pm

Location

NCS 220

Event Description

Abstract:
Books have historically been a significant part of human life, as the primary mechanism to transmit knowledge and narratives across space and time. Novels are a particularly interesting class of texts that provide a view of how people think, behave, and interact. This thesis explores various tasks involved in the large-scale analysis of literary works.
(i)We explore the Stony Book pipeline for NLP annotations and to generate visualizations and aggregate analyses of novels from large corpora of books.
(ii) Text segmentation plays an important role in NLP applications including summarization and question answering. We present two approaches to address the task of segmenting a novel into coherent parts, and a dynamic programming approach for global break prediction.
(iii) Characters and their interactions provide the fundamental framework for narratives. We propose two new methods for obtaining high-quality character representations, powered by the Stony Book annotated dataset: graph neural network based embeddings, and occurrence pattern embeddings. We also test the quality of these embeddings using a new benchmark suite.
(iv)Audiobooks involve dramatic vocalizations and intonations by the human reader. We present a dataset of aligned pairs of book texts and audiobooks and an approach to enhance standard text-to-speech (TTS) by predicting prosody attributes, resulting in higher-quality machine-generated audiobooks.
(v) Some texts are more characteristic of the author than others. We present five approaches to obtain author characterization scores and verify that they are in agreement with human judgment.

Event Title

Ph.D. Thesis Defense, Charuta Pethe: 'Natural Language Processing for the Large-Scale Analysis of Literary Works'