Maximizing Learning Efficiency With Limited Labeled Data: Applications to Healthcare and Education

Furkejuvvon:

Bibliográfalaš dieđut
Publikašuvnnas:	ProQuest Dissertations and Theses (2025)
Váldodahkki:	Enayati, Saman
Almmustuhtton:	ProQuest Dissertations & Theses
Fáttát:	Computer science Artificial intelligence Computer engineering
Liŋkkat:	Citation/Abstract Full Text - PDF
Fáddágilkorat:	Lasit fáddágilkoriid Eai fáddágilkorat, Lasit vuosttaš fáddágilkora!

MARC


LEADER	00000nab a2200000uu 4500
001	3213154982
003	UK-CbPIL
020			\|a 9798315760146
035			\|a 3213154982
045	2		\|b d20250101 \|b d20251231
084			\|a 66569 \|2 nlm
100	1		\|a Enayati, Saman
245	1		\|a Maximizing Learning Efficiency With Limited Labeled Data: Applications to Healthcare and Education
260			\|b ProQuest Dissertations & Theses \|c 2025
513			\|a Dissertation/Thesis
520	3		\|a Document classification is essential in domains such as healthcare and education, encompassing three major steps: annotation, training accurate models, and evaluation. Each of these steps is labor-intensive and time-consuming, requiring substantial amounts of labeled data, which is both costly and resource-demanding. This dissertation addresses these challenges by presenting innovative methodologies to enhance annotation efficiency, model training in low-resource settings, and automated scoring.In the healthcare domain, we tackle the challenges of annotation and training with limited resources. First, we develop a visualization approach for rapid labeling of clinical notes for smoking status extraction. The annotation process is labor-intensive and time-consuming; thus, we introduce a tool that accelerates annotation by clustering similar sentences and highlighting important keywords. This reduces the cognitive load on annotators, resulting in faster and more efficient labeling.Next, we address the problem of training accurate classifiers in low-resource settings with limited labeled data. In our first approach, MERIT (Minimal Supervision Through Label Augmentation for Biomedical Relation Extraction), we propose using shortest dependency path (SDP) representation and specific distance thresholds to propagate labels and augment high-quality labeled data. This method improves classifier accuracy compared to using limited labeled data alone. We extend this in our second approach by developing an iterative algorithm to learn automatic thresholds for label propagation. This method is tested in various scenarios, including semi-supervised learning, supervised learning, and in-context learning, demonstrating significant improvements in model performance.In the education domain, we focus on the problem of assessing narratives generated by school-aged children, a task that is both expensive and time-consuming for teachers. We leverage large language models (LLMs) to learn the scoring patterns of teachers accurately, offering a reliable tool for automated narrative scoring. This approach reduces the subjectivity and resource requirements of manual scoring, providing a scalable and consistent alternative.Experimental results across these methodologies demonstrate their effectiveness in improving annotation speed, data utilization, and model accuracy. This dissertation contributes to advancing document classification in low-resource settings, offering practical solutions for critical tasks in healthcare and education.
653			\|a Computer science
653			\|a Artificial intelligence
653			\|a Computer engineering
773	0		\|t ProQuest Dissertations and Theses \|g (2025)
786	0		\|d ProQuest \|t ProQuest Dissertations & Theses Global
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/3213154982/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch
856	4	0	\|3 Full Text - PDF \|u https://www.proquest.com/docview/3213154982/fulltextPDF/embedded/6A8EOT78XXH2IG52?source=fedsrch