Dates
Monday, July 18, 2022 - 02:00pm to Monday, July 18, 2022 - 04:00pm
Location
New Computer Science Building, Room 220
Event Description

Abstract: Books provide a rich resource of data that not only helps to spread information
among people, but also provides a wealth of knowledge that can act as training data
for machines as well. Language models have evolved immensely in recent years with
the advent of the Transformer architecture (e.g. BERT, GPT), and have greatly acceler-
ated the field of natural language processing forward. While these models have shown
to perform excellently on small texts, they do not scale well to large-scale texts such
as books. Thus, we work on applying these language models along with other tech-
niques such as alignment methods using dynamic programming to solve interesting
problems related to English fiction. In particular, we explore creating a large-scale re-
source of processed books along with novel cleaning methods to deal with OCR errors
in the original transcriptions. We then tackle questions related to understanding the
flow of narrative by examining elements such as the flow of time in a book. We finally
explore the task of chapter ordering, reconstructing the original order of chapters in a
novel given a random permutation of the text.
--------------------------

In the event of a successful defense, there will be a reception to follow

Event Title
Ph.D. Thesis Defense: Allen Kim, 'Understanding Books through Graphs and Language Models'