Adverse Event Prediction Using Natural Language Processing (NLP)

Guardat en:
Dades bibliogràfiques
Publicat a:ProQuest Dissertations and Theses (2025)
Autor principal: Ahmed, Nishat
Publicat:
ProQuest Dissertations & Theses
Matèries:
Accés en línia:Citation/Abstract
Full Text - PDF
Etiquetes: Afegir etiqueta
Sense etiquetes, Sigues el primer a etiquetar aquest registre!

MARC

LEADER 00000nab a2200000uu 4500
001 3201915407
003 UK-CbPIL
020 |a 9798314872345 
035 |a 3201915407 
045 2 |b d20250101  |b d20251231 
084 |a 66569  |2 nlm 
100 1 |a Ahmed, Nishat 
245 1 |a Adverse Event Prediction Using Natural Language Processing (NLP) 
260 |b ProQuest Dissertations & Theses  |c 2025 
513 |a Dissertation/Thesis 
520 3 |a The increasing availability of digital health records allows us to gain knowledge about the determinants of cancer outcomes in an unprecedented manner. Nevertheless, methods to efficiently and accurately identify adverse events in large retrospective studies are limited. Natural language processing (NLP) is a potential solution to extract valuable information from patient data, typically stored in unstructured text data and isolated datasets in hospital systems. Given the success of large language models (LLMs) in natural language comprehension tasks, we explored the generation of a large labeled clinical notes dataset using a generative LLM with prompt engineering and the efficacy in using this labeled dataset for fine-tuning encoder-based LLMs. We deduced that the generative LLM LLaMA 70B produces accurate note level predictions of adverse event (AE) occurrence by comparing LLaMA 70B’s predictions against a small sample of clinical notes annotated by an oncologist. Thus, we used LLaMA 70B to annotate a dataset of 7,345 patients (412,530 clinical notes) from the MSK-IMPACT dataset. The performance of this annotated dataset in fine-tuning ModernBERT and Clinical Longformer to predict AE occurrence was compared to the performance of fine-tuning these models using another version of the dataset annotated using clinical trial data. In this study, precision and recall scores of 0.80 are considered acceptable as they reflect an optimal balance between accurate predictions and sufficient sensitivity. Our results prove that the dataset labeled by LLaMA 70B performs better than the labels produced using clinical trial data. To evaluate the performance of the LLaMA 70B generated labels, we compared the LLaMA 70B predictions to our clinical trial data. On our training set of 5,875 patients, LLaMA 70B achieved a macro-averaged recall of 0.90, accuracy of 0.71, precision of 0.07, F1-score of 0.13, and specificity of 0.70. The evaluation metrics were similar for the test set. We find that LLaMA 70B note level predictions serve as better labels than our clinical trial note level labels as both ModernBERT and Clinical Longformer performed better when trained and tested on LLaMA 70B labels. While smoothing the LLaMA 70B predictions and more prompt engineering were tried to improve the performance of patient level predictions against ground truth patient level clinical trial labels, these methods did not lead to remarkable results. We find that manual inspection of LLaMA’s note level predictions by a medical expert is the best method to validate them. The most effective approach to create a clinical notes dataset with high quality labels is to have medical experts manually annotate the notes. 
653 |a Computer science 
653 |a Computer engineering 
653 |a Oncology 
653 |a Bioinformatics 
773 0 |t ProQuest Dissertations and Theses  |g (2025) 
786 0 |d ProQuest  |t Publicly Available Content Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3201915407/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3201915407/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch