Identification of dietary supplement use from electronic health records using transformer-based language models

Guardado en:
Bibliografiske detaljer
Udgivet i:BMC Medical Informatics and Decision Making vol. 22 (2025), p. 1-12
Hovedforfatter: Zhou, Sicheng
Andre forfattere: Schutte, Dalton, Xing, Aiwen, Chen, Jiyang, Wolfson, Julian, He, Zhe, Yu, Fang, Zhang, Rui
Udgivet:
Springer Nature B.V.
Fag:
Online adgang:Citation/Abstract
Full Text
Full Text - PDF
Tags: Tilføj Tag
Ingen Tags, Vær først til at tagge denne postø!
Beskrivelse
Resumen:BackgroundAlzheimer’s disease (AD) and related dementias (ADRD) are common in older adults, their prevention and management are challenging problems. To prevent or delay ADRD, dietary supplements (DS) have emerged as a promising treatment; however, the role of DS usage on disease progression of patients with cognitive impairments remains unclear. Little clinical trial evidence is available, but substantial information is contained in electronic health records (EHR), including structured and unstructured data about patients’ DS usage and disease status. The objectives of this study were to (1) develop accurate natural language processing (NLP) methods to extract DS usage for patients with Mild Cognitive Impairment (MCI) and ADRD, (2) examine the coverage of DS in structured data versus unstructured data and (3) compare DS usage information in EHR with National Health and Nutrition Examination Survey (NHANES) data.MethodsWe collected EHR data for patients with MCI and ADRD. A pipeline to extract the usage information of DS from both structured data and unstructured clinical notes was developed in the study. For structured data, we used the medication table to identify the DS and for unstructured clinical notes, we applied Bidirectional Encoder Representations from Transformers (BERT) fine-tuning strategy to extract the DS usage status.ResultsThe best named entity recognition model for DS achieved an F1-score of 0.964 and the PubMed BERT-based use status classifier had a weighted F1-score of 0.879. We applied these models to extract DS usage information from unstructured clinical notes and subsequently compared and combined with those from structured medication orders. In total, 125 unique DS were identified for patients with MCI and 108 unique DS were identified for patients with ADRD.ConclusionsIn this study, we developed an NLP-based pipeline to extract the DS use information from medication structured data and clinical notes in EHR for patients with MCI and ADRD. Our method could further help understand the DS usage of patients with MCI and ADRD, and how these DS could influence the diseases.
ISSN:1472-6947
DOI:10.1186/s12911-025-03252-9
Fuente:Healthcare Administration Database