Factualizing Biomedical Event Extraction
Guardado en:
| Publicado en: | PQDT - Global (2024) |
|---|---|
| Autor principal: | |
| Publicado: |
ProQuest Dissertations & Theses
|
| Materias: | |
| Acceso en línea: | Citation/Abstract Full Text - PDF |
| Etiquetas: |
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
| Resumen: | Recent research describes a replication crisis in science due to an overwhelming amount of scientific literature and a corresponding decline in quality. Event extraction (EE), a subtask of natural language processing (NLP), has the potential to mitigate the ensuing information overload by automatically extracting structured knowledge representations and making scientific literature easily searchable. However, existing methods for EE focus on representing only the core aspects of events –who did what to whom– which constitutes an assumption that all extracted events are true. We argue that this assumption is fundamentally problematic because it dooms EE systems to misrepresent the world, especially in the scientific domain where the claims are often expressed as false or uncertain.To address this problematic assumption, we propose a fundamental conceptual shift in EE from the representation of facts to the representation of beliefs. While a fact is necessarily (assumed to be) true, beliefs lie on a continuous scale of truth and uncertainty, which together constitute the notion of factuality. We thus propose that EE systems include de facto dimensions of polarity and certainty in addition to the core representation.The importance of these dimensions to EE is far from a new realization, but to our knowledge the necessity of fundamentally integrating factuality into the core event representation has not yet been established in NLP research. One contribution of this thesis is theoretical arguments for this integration. Another contribution is the introduction of the task of Factualized Event Extraction (FEE), which targets the joint extraction of events’ core representation alongside the expressed polarity and uncertainty. As part of this, we introduce SemMed Context, a new dataset for FEE in the biomedical domain, and describe the development of multi-task models for FEE based on variational autoencoders (VAEs) and pre-trained language models (PLMs). Experiments with both models show a large improvement over a baseline, and even suggest that multi-task learning of event content, polarity, and uncertainty performs as well as or better than learning separate models for each. In addition to good prediction performance, we introduce methods that lend interpretability to the model predictions: interpretable latent distributions in the VAE and an interpretable attention mechanism in the PLM-based model. |
|---|---|
| ISBN: | 9798304991056 |
| Fuente: | ProQuest Dissertations & Theses Global |