Breaking Digital Health Barriers Through a Large Language Model–Based Tool for Automated Observational Medical Outcomes Partnership Mapping: Development and Validation Study

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of Medical Internet Research vol. 27 (2025), p. e69004
1. Verfasser: Adams, Meredith CB
Weitere Verfasser: Perkins, Matthew L, Hudson, Cody, Madhira, Vithal, Akbilgic, Oguz, Ma, Da, Hurley, Robert W, Topaloglu, Umit
Veröffentlicht:
Gunther Eysenbach MD MPH, Associate Professor
Schlagworte:
Online-Zugang:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Tags: Tag hinzufügen
Keine Tags, Fügen Sie das erste Tag hinzu!

MARC

LEADER 00000nab a2200000uu 4500
001 3222369150
003 UK-CbPIL
022 |a 1438-8871 
024 7 |a 10.2196/69004  |2 doi 
035 |a 3222369150 
045 2 |b d20250101  |b d20251231 
100 1 |a Adams, Meredith CB 
245 1 |a Breaking Digital Health Barriers Through a Large Language Model–Based Tool for Automated Observational Medical Outcomes Partnership Mapping: Development and Validation Study 
260 |b Gunther Eysenbach MD MPH, Associate Professor  |c 2025 
513 |a Journal Article 
520 3 |a Background:The integration of diverse clinical data sources requires standardization through models such as Observational Medical Outcomes Partnership (OMOP). However, mapping data elements to OMOP concepts demands significant technical expertise and time. While large health care systems often have resources for OMOP conversion, smaller clinical trials and studies frequently lack such support, leaving valuable research data siloed.Objective:This study aims to develop and validate a user-friendly tool that leverages large language models to automate the OMOP conversion process for clinical trials, electronic health records, and registry data.Methods:We developed a 3-tiered semantic matching system using GPT-3 embeddings to transform heterogeneous clinical data to the OMOP Common Data Model. The system processes input terms by generating vector embeddings, computing cosine similarity against precomputed Observational Health Data Sciences and Informatics vocabulary embeddings, and ranking potential matches. We validated the system using two independent datasets: (1) a development set of 76 National Institutes of Health Helping to End Addiction Long-term Initiative clinical trial common data elements for chronic pain and opioid use disorders and (2) a separate validation set of electronic health record concepts from the National Institutes of Health National COVID Cohort Collaborative COVID-19 enclave. The architecture combines Unified Medical Language System semantic frameworks with asynchronous processing for efficient concept mapping, made available through an open-source implementation.Results:The system achieved an area under the receiver operating characteristic curve of 0.9975 for mapping clinical trial common data element terms. Precision ranged from 0.92 to 0.99 and recall ranged from 0.88 to 0.97 across similarity thresholds from 0.85 to 1.0. In practical application, the tool successfully automated mappings that previously required manual informatics expertise, reducing the technical barriers for research teams to participate in large-scale, data-sharing initiatives. Representative mappings demonstrated high accuracy, such as demographic terms achieving 100% similarity with corresponding Logical Observation Identifiers Names and Codes concepts. The implementation successfully processes diverse data types through both individual term mapping and batch processing capabilities.Conclusions:Our validated large language model–based tool effectively automates the transformation of clinical data into the OMOP format while maintaining high accuracy. The combination of semantic matching capabilities and a researcher-friendly interface makes data harmonization accessible to smaller research teams without requiring extensive informatics support. This has direct implications for accelerating clinical research data standardization and enabling broader participation in initiatives such as the National Institutes of Health Helping to End Addiction Long-term Initiative Data Ecosystem. 
653 |a Experts 
653 |a Collaboration 
653 |a Long term 
653 |a Opioids 
653 |a Clinical research 
653 |a Concept mapping 
653 |a Conversion 
653 |a Mapping 
653 |a Automation 
653 |a Teams 
653 |a Clinical trials 
653 |a Health initiatives 
653 |a Vocabulary 
653 |a Standardization 
653 |a Substance use disorder 
653 |a Clinical outcomes 
653 |a Validation studies 
653 |a COVID-19 
653 |a Medical research 
653 |a Addictions 
653 |a Computerized medical records 
653 |a Chronic pain 
653 |a Harmonization 
653 |a Health records 
653 |a Health information 
653 |a Language 
653 |a Large language models 
653 |a Accuracy 
653 |a Medical records 
653 |a Research 
653 |a Drug addiction 
653 |a Health 
653 |a Medical language 
653 |a Validity 
653 |a Semantic processing 
653 |a Data 
653 |a Maps 
653 |a Institutes 
653 |a Data processing 
653 |a Health services 
653 |a Disorders 
653 |a Language shift 
653 |a Language modeling 
653 |a Economic development 
653 |a Health care 
700 1 |a Perkins, Matthew L 
700 1 |a Hudson, Cody 
700 1 |a Madhira, Vithal 
700 1 |a Akbilgic, Oguz 
700 1 |a Ma, Da 
700 1 |a Hurley, Robert W 
700 1 |a Topaloglu, Umit 
773 0 |t Journal of Medical Internet Research  |g vol. 27 (2025), p. e69004 
786 0 |d ProQuest  |t Library Science Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3222369150/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text + Graphics  |u https://www.proquest.com/docview/3222369150/fulltextwithgraphics/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3222369150/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch