scBaseCamp: An AI agent-curated, uniformly processed, and continually expanding single cell data repository

Guardado en:
Detalles Bibliográficos
Publicado en:bioRxiv (Mar 4, 2025)
Autor principal: Youngblut, Nicholas D
Otros Autores: Carpenter, Christopher, Prashar, Jaanak, Ricci-Tam, Chiara, Ilango, Rajesh, Teyssier, Noam, Konermann, Silvana, Hsu, Patrick, Dobin, Alexander, Burke, David P, Goodarzi, Hani, Roohani, Yusuf H
Publicado:
Cold Spring Harbor Laboratory Press
Materias:
Acceso en línea:Citation/Abstract
Full Text - PDF
Full text outside of ProQuest
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3173595147
003 UK-CbPIL
022 |a 2692-8205 
024 7 |a 10.1101/2025.02.27.640494  |2 doi 
035 |a 3173595147 
045 0 |b d20250304 
100 1 |a Youngblut, Nicholas D 
245 1 |a scBaseCamp: An AI agent-curated, uniformly processed, and continually expanding single cell data repository 
260 |b Cold Spring Harbor Laboratory Press  |c Mar 4, 2025 
513 |a Working Paper 
520 3 |a Building a virtual model of the cell is an emerging frontier at the intersection of artificial intelligence and biology, aided by the rapid growth of single-cell RNA sequencing data. By aggregating gene expression profiles from millions of cells across hundreds of studies, single cell atlases have provided a foundation for training AI-driven models of the cell. However, reliance on datasets with pre-processed counts limits the size and diversity of these repositories and constrains downstream model training to data curated for divergent purposes. This introduces analytical variability due to differences in the choice of alignment tools, genome references, and counting strategies. Here, we introduce scBaseCamp, a continuously updated single-cell RNA-seq database that leverages an AI agent-driven hierarchical workflow to automate discovery, metadata extraction, and standardized data processing. Built by directly mining and processing all publicly accessible 10X Genomics single-cell RNA sequencing reads, scBaseCamp is currently the largest public repository of single-cell data, comprising over 230 million cells spanning 21 organisms and 72 tissues. Using studies comprised of both single cell and single nucleus sequencing data, we demonstrate that uniform processing across datasets helps mitigate analytical artifacts introduced by inconsistent data processing choices. This standardized approach lays the groundwork for more accurate virtual cell models and serves as a foundation for a wide range of biological and biomedical applications.Competing Interest StatementD.P.B. acknowledges outside interest as a Google Advisor. H.G. acknowledges outside interest as a co-founder of Exai Bio, Vevo Therapeutics, and Therna Therapeutics, serves on the board of directors at Exai Bio, and is a scientific advisory board member for Verge Genomics and Deep Forest Biosciences. P.D.H. acknowledges outside interest as a co-founder of Terrain Biosciences, Stylus Medicine, and Spotlight Therapeutics, serves on the board of directors at Stylus Medicine, is a board observer at EvolutionaryScale and Terrain Biosciences, a scientific advisory board member at Arbor Biosciences and Veda Bio, and an advisor to NFDG, Varda Space, and Vial Health. All other authors declare no competing interests. 
653 |a Data processing 
653 |a Ribonucleic acid--RNA 
653 |a Artificial intelligence 
653 |a Gene expression 
653 |a Genomics 
653 |a Advisors 
653 |a Cell culture 
653 |a Boards of directors 
700 1 |a Carpenter, Christopher 
700 1 |a Prashar, Jaanak 
700 1 |a Ricci-Tam, Chiara 
700 1 |a Ilango, Rajesh 
700 1 |a Teyssier, Noam 
700 1 |a Konermann, Silvana 
700 1 |a Hsu, Patrick 
700 1 |a Dobin, Alexander 
700 1 |a Burke, David P 
700 1 |a Goodarzi, Hani 
700 1 |a Roohani, Yusuf H 
773 0 |t bioRxiv  |g (Mar 4, 2025) 
786 0 |d ProQuest  |t Biological Science Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3173595147/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3173595147/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u https://www.biorxiv.org/content/10.1101/2025.02.27.640494v1