Adverse Event Prediction Using Natural Language Processing (NLP)

Guardat en:

Dades bibliogràfiques
Publicat a:	ProQuest Dissertations and Theses (2025)
Autor principal:	Ahmed, Nishat
Publicat:	ProQuest Dissertations & Theses
Matèries:	Computer science Computer engineering Oncology Bioinformatics
Accés en línia:	Citation/Abstract Full Text - PDF
Etiquetes:	Afegir etiqueta Sense etiquetes, Sigues el primer a etiquetar aquest registre!

MARC


LEADER	00000nab a2200000uu 4500
001	3201915407
003	UK-CbPIL
020			\|a 9798314872345
035			\|a 3201915407
045	2		\|b d20250101 \|b d20251231
084			\|a 66569 \|2 nlm
100	1		\|a Ahmed, Nishat
245	1		\|a Adverse Event Prediction Using Natural Language Processing (NLP)
260			\|b ProQuest Dissertations & Theses \|c 2025
513			\|a Dissertation/Thesis
520	3		\|a The increasing availability of digital health records allows us to gain knowledge about the determinants of cancer outcomes in an unprecedented manner. Nevertheless, methods to efficiently and accurately identify adverse events in large retrospective studies are limited. Natural language processing (NLP) is a potential solution to extract valuable information from patient data, typically stored in unstructured text data and isolated datasets in hospital systems. Given the success of large language models (LLMs) in natural language comprehension tasks, we explored the generation of a large labeled clinical notes dataset using a generative LLM with prompt engineering and the efficacy in using this labeled dataset for fine-tuning encoder-based LLMs. We deduced that the generative LLM LLaMA 70B produces accurate note level predictions of adverse event (AE) occurrence by comparing LLaMA 70B’s predictions against a small sample of clinical notes annotated by an oncologist. Thus, we used LLaMA 70B to annotate a dataset of 7,345 patients (412,530 clinical notes) from the MSK-IMPACT dataset. The performance of this annotated dataset in fine-tuning ModernBERT and Clinical Longformer to predict AE occurrence was compared to the performance of fine-tuning these models using another version of the dataset annotated using clinical trial data. In this study, precision and recall scores of 0.80 are considered acceptable as they reflect an optimal balance between accurate predictions and sufficient sensitivity. Our results prove that the dataset labeled by LLaMA 70B performs better than the labels produced using clinical trial data. To evaluate the performance of the LLaMA 70B generated labels, we compared the LLaMA 70B predictions to our clinical trial data. On our training set of 5,875 patients, LLaMA 70B achieved a macro-averaged recall of 0.90, accuracy of 0.71, precision of 0.07, F1-score of 0.13, and specificity of 0.70. The evaluation metrics were similar for the test set. We find that LLaMA 70B note level predictions serve as better labels than our clinical trial note level labels as both ModernBERT and Clinical Longformer performed better when trained and tested on LLaMA 70B labels. While smoothing the LLaMA 70B predictions and more prompt engineering were tried to improve the performance of patient level predictions against ground truth patient level clinical trial labels, these methods did not lead to remarkable results. We find that manual inspection of LLaMA’s note level predictions by a medical expert is the best method to validate them. The most effective approach to create a clinical notes dataset with high quality labels is to have medical experts manually annotate the notes.
653			\|a Computer science
653			\|a Computer engineering
653			\|a Oncology
653			\|a Bioinformatics
773	0		\|t ProQuest Dissertations and Theses \|g (2025)
786	0		\|d ProQuest \|t Publicly Available Content Database
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/3201915407/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch
856	4	0	\|3 Full Text - PDF \|u https://www.proquest.com/docview/3201915407/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch