Enabling scalable single-cell transcriptomic analysis through distributed computing with Apache spark

Guardado en:
Detalles Bibliográficos
Publicado en:Scientific Reports (Nature Publisher Group) vol. 15, no. 1 (2025), p. 27713-27729
Autor principal: Adil, Asif
Otros Autores: Bhattacharya, Namrata, Aadam, Khan, Naveed Jeelani, Asger, Mohammed
Publicado:
Nature Publishing Group
Materias:
Acceso en línea:Citation/Abstract
Full Text
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3234544117
003 UK-CbPIL
022 |a 2045-2322 
024 7 |a 10.1038/s41598-025-12897-5  |2 doi 
035 |a 3234544117 
045 2 |b d20250101  |b d20251231 
084 |a 274855  |2 nlm 
100 1 |a Adil, Asif  |u Department of Computer Sciences, Baba Ghulam Shah Badshah University, Rajouri, India (ROR: https://ror.org/00fp2m518) (GRID: grid.449274.8) (ISNI: 0000 0004 1772 8436); Department of Pathology and Laboratory Medicine, School of Medicine, Indiana University Indianapolis, Indianapolis, IN, USA (ROR: https://ror.org/05gxnyn08) (GRID: grid.257413.6) (ISNI: 0000 0001 2287 3919); Department of Pathology and Laboratory Medicine, Indiana University Indianapolis, Indianapolis, IN, USA (ROR: https://ror.org/03eftgw80) 
245 1 |a Enabling scalable single-cell transcriptomic analysis through distributed computing with Apache spark 
260 |b Nature Publishing Group  |c 2025 
513 |a Journal Article 
520 3 |a As the field of single-cell genomics continues to develop, the generation of large-scale scRNA-seq datasets has become more prevalent. Although these datasets offer tremendous potential for shedding light on the complex biology of individual cells, the sheer volume of data presents significant challenges for management and analysis. Off late, to address these challenges, a new discipline, known as “big single-cell data science,” has emerged. Within this field, a variety of computational tools have been developed to facilitate the processing and interpretation of scRNA-seq data. However, several of these tools primarily focus on the analytical aspect and tend to overlook the burgeoning data deluge generated by scRNA-seq experiments. In this study, we try to address this challenge and present a novel parallel analytical framework, scSPARKL, that leverages the power of Apache Spark to enable the efficient analysis of single-cell transcriptomic data. scSPARKL is fortified by a rich set of staged algorithms developed to optimize the Apache Spark’s work environment. The tool incorporates six key operations for dealing with single-cell Big Data, including data reshaping, data preprocessing, cell/gene filtering, data normalization, dimensionality reduction, and clustering. By utilizing Spark’s unlimited scalability, fault tolerance, and parallelism, the tool enables researchers to rapidly and accurately analyze scRNA-seq datasets of any size. We demonstrate the utility of our framework and algorithms through a series of experiments on real-world scRNA-seq data. Overall, our results suggest that scSPARKL represents a powerful and flexible tool for the analysis of single-cell transcriptomic data, with broad applications across the fields of biology and medicine. 
653 |a Machine learning 
653 |a Big Data 
653 |a Cells 
653 |a Random access memory 
653 |a Gene expression 
653 |a Datasets 
653 |a Algorithms 
653 |a Working conditions 
653 |a Quality control 
653 |a Medical research 
653 |a Data analysis 
653 |a Transcriptomics 
653 |a Genomics 
653 |a Clustering 
653 |a Fault tolerance 
653 |a Distributed processing 
653 |a Environmental 
700 1 |a Bhattacharya, Namrata  |u Department of Computer Science and Engineering, Indraprastha Institute of Information Technology, New Delhi, India (ROR: https://ror.org/03vfp4g33) (GRID: grid.454294.a) (ISNI: 0000 0004 1773 2689); Australian Prostate Cancer Research Center, Queensland University of Technology, Brisbane, Australia (ROR: https://ror.org/03pnv4752) (GRID: grid.1024.7) (ISNI: 0000 0000 8915 0953) 
700 1 |a Aadam  |u Department of Computer Science, Luddy School of Informatics, Indiana University Indianapolis, Indianapolis, IN, USA (ROR: https://ror.org/03eftgw80) 
700 1 |a Khan, Naveed Jeelani  |u Department of Computer Science and Engineering, Model Institute of Engineering and Technology, Jammu, Jammu and Kashmir, India (ROR: https://ror.org/02retg991) (GRID: grid.412986.0) (ISNI: 0000 0001 0705 4560) 
700 1 |a Asger, Mohammed  |u Department of Computer Science and Engineering, Model Institute of Engineering and Technology, Jammu, Jammu and Kashmir, India (ROR: https://ror.org/02retg991) (GRID: grid.412986.0) (ISNI: 0000 0001 0705 4560) 
773 0 |t Scientific Reports (Nature Publisher Group)  |g vol. 15, no. 1 (2025), p. 27713-27729 
786 0 |d ProQuest  |t Science Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3234544117/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text  |u https://www.proquest.com/docview/3234544117/fulltext/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3234544117/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch