Delineating the effective use of self-supervised learning in single-cell genomics

Guardado en:

Detalles Bibliográficos
Publicado en:	Nature Machine Intelligence vol. 7, no. 1 (Jan 2025), p. 68-82
Autor principal:	Richter, Till
Otros Autores:	Bahrami, Mojtaba, Xia, Yufan, Fischer, David S., Theis, Fabian J.
Publicado:	Nature Publishing Group
Materias:	Data analysis Datasets Self-supervised learning Computer vision Data integration Empirical analysis Genomics Natural language processing Genes Representations Benchmarks Task complexity
Acceso en línea:	Citation/Abstract Full Text Full Text - PDF
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC


LEADER	00000nab a2200000uu 4500
001	3159907619
003	UK-CbPIL
022			\|a 2522-5839
024	7		\|a 10.1038/s42256-024-00934-3 \|2 doi
035			\|a 3159907619
045	2		\|b d20250101 \|b d20250131
100	1		\|a Richter, Till \|u Department of Computational Health, Institute of Computational Biology, Helmholtz Munich, Munich, Germany (ROR: https://ror.org/00cfam450) (GRID: grid.4567.0) (ISNI: 0000 0004 0483 2525); TUM School of Computation, Information and Technology, Technical University of Munich, Munich, Germany (ROR: https://ror.org/02kkvpp62) (GRID: grid.6936.a) (ISNI: 0000 0001 2322 2966)
245	1		\|a Delineating the effective use of self-supervised learning in single-cell genomics
260			\|b Nature Publishing Group \|c Jan 2025
513			\|a Journal Article
520	3		\|a Self-supervised learning (SSL) has emerged as a powerful method for extracting meaningful representations from vast, unlabelled datasets, transforming computer vision and natural language processing. In single-cell genomics (SCG), representation learning offers insights into the complex biological data, especially with emerging foundation models. However, identifying scenarios in SCG where SSL outperforms traditional learning methods remains a nuanced challenge. Furthermore, selecting the most effective pretext tasks within the SSL framework for SCG is a critical yet unresolved question. Here we address this gap by adapting and benchmarking SSL methods in SCG, including masked autoencoders with multiple masking strategies and contrastive learning methods. Models trained on over 20 million cells were examined across multiple downstream tasks, including cell-type prediction, gene-expression reconstruction, cross-modality prediction and data integration. Our empirical analyses underscore the nuanced role of SSL, namely, in transfer learning scenarios leveraging auxiliary data or analysing unseen datasets. Masked autoencoders excel over contrastive methods in SCG, diverging from computer vision trends. Moreover, our findings reveal the notable capabilities of SSL in zero-shot settings and its potential in cross-modality prediction and data integration. In summary, we study SSL methods in SCG on fully connected networks and benchmark their utility across key representation learning scenarios.Self-supervised learning techniques are powerful assets for enabling deep insights into complex, unlabelled single-cell genomic data. Richter et al. here benchmark the applicability of self-supervised architectures into key downstream representation learning scenarios.
653			\|a Data analysis
653			\|a Datasets
653			\|a Self-supervised learning
653			\|a Computer vision
653			\|a Data integration
653			\|a Empirical analysis
653			\|a Genomics
653			\|a Natural language processing
653			\|a Genes
653			\|a Representations
653			\|a Benchmarks
653			\|a Task complexity
700	1		\|a Bahrami, Mojtaba \|u Department of Computational Health, Institute of Computational Biology, Helmholtz Munich, Munich, Germany (ROR: https://ror.org/00cfam450) (GRID: grid.4567.0) (ISNI: 0000 0004 0483 2525); TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany (ROR: https://ror.org/02kkvpp62) (GRID: grid.6936.a) (ISNI: 0000 0001 2322 2966)
700	1		\|a Xia, Yufan \|u TUM School of Computation, Information and Technology, Technical University of Munich, Munich, Germany (ROR: https://ror.org/02kkvpp62) (GRID: grid.6936.a) (ISNI: 0000 0001 2322 2966)
700	1		\|a Fischer, David S. \|u Department of Computational Health, Institute of Computational Biology, Helmholtz Munich, Munich, Germany (ROR: https://ror.org/00cfam450) (GRID: grid.4567.0) (ISNI: 0000 0004 0483 2525); Eric and Wendy Schmidt Center at the Broad Institute, Cambridge, MA, USA (ROR: https://ror.org/05a0ya142) (GRID: grid.66859.34) (ISNI: 0000 0004 0546 1623)
700	1		\|a Theis, Fabian J. \|u Department of Computational Health, Institute of Computational Biology, Helmholtz Munich, Munich, Germany (ROR: https://ror.org/00cfam450) (GRID: grid.4567.0) (ISNI: 0000 0004 0483 2525); TUM School of Computation, Information and Technology, Technical University of Munich, Munich, Germany (ROR: https://ror.org/02kkvpp62) (GRID: grid.6936.a) (ISNI: 0000 0001 2322 2966); TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany (ROR: https://ror.org/02kkvpp62) (GRID: grid.6936.a) (ISNI: 0000 0001 2322 2966)
773	0		\|t Nature Machine Intelligence \|g vol. 7, no. 1 (Jan 2025), p. 68-82
786	0		\|d ProQuest \|t Science Database
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/3159907619/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch
856	4	0	\|3 Full Text \|u https://www.proquest.com/docview/3159907619/fulltext/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch
856	4	0	\|3 Full Text - PDF \|u https://www.proquest.com/docview/3159907619/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch