Toward Comprehensive Broad-Coverage Cross-Document Event Extraction
Guardado en:
| Publicado en: | ProQuest Dissertations and Theses (2024) |
|---|---|
| Autor principal: | |
| Publicado: |
ProQuest Dissertations & Theses
|
| Materias: | |
| Acceso en línea: | Citation/Abstract Full Text - PDF |
| Etiquetas: |
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
| Resumen: | Event extraction in natural language processing has traditionally focused on sentence-level analysis, limiting our ability to comprehensively understand complex events across multiple documents. This dissertation introduces a novel approach to comprehensive broad-coverage cross-document event extraction, addressing key limitations in existing methods. The research presents a series of interconnected tasks and datasets that bridge the gap between sentence-level and document-level event understanding, with the ultimate goal of enabling automated extraction of all events from collections of related documents.The work progresses from fundamental tasks of Frame Identification and Source Validation to the introduction of the Frames Across Multiple Sources (FAMuS) corpus, enabling Cross-Document Argument Extraction. Building on these foundations, the dissertation explores methods for extracting multiple event instances within single documents, a crucial step towards fully automated cross-document event extraction. A key innovation is the consistent use of a broad-coverage ontology--FrameNet--as the underlying representation of events. This innovation allows for a wide range of event types to be extracted across diverse domains.This approach, combined with the development of novel annotation methodologies and silver data generation techniques, lays the groundwork for more flexible and scalable event extraction systems. By addressing the challenges of document-level analysis, cross-document information synthesis, and multi-instance event extraction, this research paves the way for comprehensive cross-document event extraction. |
|---|---|
| ISBN: | 9798346858164 |
| Fuente: | Publicly Available Content Database |