Abstract: Extracting accurate and credible insights from unstructured natural language data is a fundamental challenge across domains like clinical text and online content. In clinical settings, extracting key information from medical notes enables applications for decision support, risk assessment, and public health monitoring. Recent advancements in clinical natural language processing enable us to delve into techniques such as semantic similarity modeling and medication event extraction, enhancing the precision of insight extraction through text analysis. However, reliably extracting insights and validating claims from noisy text data remains an open challenge. This connects to emerging work investigating how people perceive and validate online health information, revealing issues of misinformation across genres. With misinformation rapidly spreading across online platforms, scalable natural language processing techniques are essential to combat potentially misleading content in form of extracting check-worthy claims, analyzing logical fallacies, assessing user emotions, and determining content credibility based on indicators like author reputation. This thesis develops natural language processing techniques to extract clinical insights, evaluate online content credibility, analyze logical fallacies, and understand social perspectives, spanning domains including healthcare, public health communications, and social media discourse.
In the context of the COVID-19 pandemic, the rapid and widespread dissemination of misinformation across online platforms, especially on Twitter, underscores the crucial need for scalable natural language processing techniques. We introduce a COVID-19 Twitter dataset, and present a three-stage process to (i) determine whether a given Tweet is indeed check-worthy, and if so, (ii) which portion of the Tweet ought to be checked for veracity sequence labeling approach, and finally, (iii) determine the author's stance towards the claim in that Tweet. The ability to automatically detect check-worthy claims and determine stance in subjective online discourse has important implications for mitigating misinformation during public health crises and understanding evolving social perceptions more broadly. We also conduct a comparative analysis of how influential individuals and organizational Twitter accounts respond to COVID-19 topics. This provides insights into the dynamics and influences shaping the dissemination of pandemic information, helping us understand the complex interplay in digital discourse. We contribute to understanding of misinformation during COVID-19 by releasing multiple annotated datasets, developing unified models across datasets and task formulations, and sharing insights to advance understanding of pandemic misinformation.
The spread of misinformation online enables more whataboutism tactics, while whataboutism conversely allows misinformation to propagate unchecked. We delve into the intricate realm of whataboutism, aiming to distinguish it from the traditional tu quoque fallacy. We demonstrate that whataboutism is distinct from propaganda and the classical tu quoque fallacy, presenting new datasets from Twitter and YouTube to support this distinction. Our study explores transfer learning and similarity in detecting whataboutism, highlighting challenges with current methods. Using instruction-based prompting, we introduce a unified model for recognizing whataboutism across the available dataset. We also address potential areas for improvement in these types of models.
On Zoom, contact events [at] cs.stonybrook.edu for access.
Dates
Thursday, April 18, 2024 - 02:00pm to Thursday, April 18, 2024 - 04:00pm
Event Description
Event Title
Ph.D. Thesis Defense: Noushin Salek Faramarzi, 'From Clinical Notes to Online Content: Computational Models for Extracting Insights and Understanding Misinformation'