Automating Candidate Gene Prioritization with Large Language Models: Development and Benchmarking of an API-Driven Workflow Leveraging GPT-4
Sparad:
| I publikationen: | bioRxiv (Dec 16, 2024) |
|---|---|
| Huvudupphov: | |
| Övriga upphov: | , , , , , , , , , , , |
| Utgiven: |
Cold Spring Harbor Laboratory Press
|
| Ämnen: | |
| Länkar: | Citation/Abstract Full Text - PDF Full text outside of ProQuest |
| Taggar: |
Inga taggar, Lägg till första taggen!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 3145269254 | ||
| 003 | UK-CbPIL | ||
| 022 | |a 2692-8205 | ||
| 024 | 7 | |a 10.1101/2024.12.10.627808 |2 doi | |
| 035 | |a 3145269254 | ||
| 045 | 0 | |b d20241216 | |
| 100 | 1 | |a Khan, Taushif | |
| 245 | 1 | |a Automating Candidate Gene Prioritization with Large Language Models: Development and Benchmarking of an API-Driven Workflow Leveraging GPT-4 | |
| 260 | |b Cold Spring Harbor Laboratory Press |c Dec 16, 2024 | ||
| 513 | |a Working Paper | ||
| 520 | 3 | |a In this exploratory study, we developed an automated workflow that leverages Large Language Models, specifically GPT-4, to prioritize candidate genes for targeted assay development. The workflow automates interaction with OpenAI models and enables prompt creation, submission. It features customizable prompts designed to evaluate candidate genes based on criteria such as association with biological processes, biomarker potential, and therapeutic implications, which can be tailored for specific diseases or processes. Benchmarking experiments comparing the performance of the Application Programming Interface (API)-based automated prompting approach with manual prompting demonstrated high consistency and reproducibility in gene prioritization results. The automated method exhibited scalability by successfully prioritizing genes relevant to sepsis from the BloodGen3 repertoire, comprising 11,465 genes, distributed among 382 modules. The workflow efficiently identified sepsis-associated genes across the repertoire, revealing distinct gene clusters and providing insights into their distribution within module aggregates and individual modules. This proof-of-concept study demonstrates how LLMs can enhance gene prioritization, streamlining the identification process for targeted assays across various biological contexts. However, it also reveals the need for further validation and highlights the exploratory nature of this work due to scoring inconsistencies and the necessity for manual fact-checking. Despite these challenges, the automated workflow holds promise for accelerating targeted assay development for disease management and paves the way for future research.Competing Interest StatementThe authors have declared no competing interest. | |
| 653 | |a Sepsis | ||
| 653 | |a Application programming interface | ||
| 653 | |a Models | ||
| 653 | |a Automation | ||
| 653 | |a Genes | ||
| 653 | |a Large language models | ||
| 653 | |a Gene clusters | ||
| 700 | 1 | |a Toufiq, Mohammed | |
| 700 | 1 | |a Yurieva, Marina | |
| 700 | 1 | |a Indrawattana, Nitaya | |
| 700 | 1 | |a Jittmittraphap, Akanitt | |
| 700 | 1 | |a Kosoltanapiwat, Nathamon | |
| 700 | 1 | |a Pumirat, Pornpan | |
| 700 | 1 | |a Sukphopetch, Passanesh | |
| 700 | 1 | |a Vanaporn, Muthita | |
| 700 | 1 | |a Kaber, Basirudeen | |
| 700 | 1 | |a Palucka, Karolina | |
| 700 | 1 | |a Rinchai, Darawan | |
| 700 | 1 | |a Chaussabel, Damien | |
| 773 | 0 | |t bioRxiv |g (Dec 16, 2024) | |
| 786 | 0 | |d ProQuest |t Biological Science Database | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/3145269254/abstract/embedded/ZKJTFFSVAI7CB62C?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text - PDF |u https://www.proquest.com/docview/3145269254/fulltextPDF/embedded/ZKJTFFSVAI7CB62C?source=fedsrch |
| 856 | 4 | 0 | |3 Full text outside of ProQuest |u https://www.biorxiv.org/content/10.1101/2024.12.10.627808v1 |