Improving Scientific Information Extraction With Text Generation

Uloženo v:
Podrobná bibliografie
Vydáno v:ProQuest Dissertations and Theses (2025)
Hlavní autor: Zeng, Qingkai
Vydáno:
ProQuest Dissertations & Theses
Témata:
On-line přístup:Citation/Abstract
Full Text - PDF
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

MARC

LEADER 00000nab a2200000uu 4500
001 3184961716
003 UK-CbPIL
020 |a 9798310167896 
035 |a 3184961716 
045 2 |b d20250101  |b d20251231 
084 |a 66569  |2 nlm 
100 1 |a Zeng, Qingkai 
245 1 |a Improving Scientific Information Extraction With Text Generation 
260 |b ProQuest Dissertations & Theses  |c 2025 
513 |a Dissertation/Thesis 
520 3 |a As research communities expand, the number of scientific articles continues to grow rapidly, with no signs of slowing. This information overload drives the need for automated tools to identify relevant materials and extract key ideas. Information extraction (IE) focuses on converting unstructured scientific text into structured knowledge (e.g., ontologies, taxonomies, and knowledge graphs), enabling intelligent systems to excel in tasks like document organization, scientific literature retrieval and recommendation, claim verification even novel idea or hypothesis generation. To pinpoint the scope of this thesis, I focus on the taxonomic structure in this thesis to represent the knowledge in the scientific domain.To construct a taxonomy from scientific corpora, traditional methods often rely on pipeline frameworks. These frameworks typically follow a sequence: first, extracting scientific concepts or entities from the corpus; second, identifying hierarchical relationships between the concepts; and finally, organizing these relationships into a cohesive taxonomy. However, such methods encounter several challenges: (1) the quality of the corpus or annotation data, (2) error propagation within the pipeline framework, and (3) limited generalization and transferability to other specific domains. The development of large language models (LLMs) offers promising advancements, as these models have demonstrated remarkable abilities to internalize knowledge and respond effectively to a wide range of inquiries. Unlike traditional pipeline-based approaches, generative methods harness LLMs to achieve (1) better utilization of their internalized knowledge, (2) direct text-to-knowledge conversion, and (3) flexible, schema-free adaptability.This thesis explores innovative methods for integrating text generation technologies to improve IE in the scientific domain, with a focus on taxonomy construction. The approach begins with generating entity names and evolves to create or enrich taxonomies directly via text generation. I will explore combining neighborhood structural context, descriptive textual information, and LLMs’ internal knowledge to improve output quality. Finally, this thesis will outline future research directions. 
653 |a Computer science 
653 |a Computer engineering 
653 |a Information science 
773 0 |t ProQuest Dissertations and Theses  |g (2025) 
786 0 |d ProQuest  |t Publicly Available Content Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3184961716/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3184961716/fulltextPDF/embedded/6A8EOT78XXH2IG52?source=fedsrch