Penalised regression improves imputation of cell-type specific expression using RNA-seq data from mixed cell populations compared to domain-specific methods

Guardado en:
Detalles Bibliográficos
Publicado en:bioRxiv (Jan 24, 2025)
Autor principal: Wei-Yu, Lin
Otros Autores: Kartawinata, Melissa, Bethany Rose Jebson, Restuadi, Restuadi, Peckham, Hannah, Radziszewska, Anna, Deakin, Claire, Ciurtin, Coziana, Consortium, Cluster, Wedderburn, Lucy R, Wallace, Chris
Publicado:
Cold Spring Harbor Laboratory Press
Materias:
Acceso en línea:Citation/Abstract
Full Text - PDF
Full text outside of ProQuest
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3159548383
003 UK-CbPIL
022 |a 2692-8205 
024 7 |a 10.1101/2023.09.11.556650  |2 doi 
035 |a 3159548383 
045 0 |b d20250124 
100 1 |a Wei-Yu, Lin 
245 1 |a Penalised regression improves imputation of cell-type specific expression using RNA-seq data from mixed cell populations compared to domain-specific methods 
260 |b Cold Spring Harbor Laboratory Press  |c Jan 24, 2025 
513 |a Working Paper 
520 3 |a Gene expression studies often use bulk RNA sequencing of mixed cell populations because single cell or sorted cell sequencing may be prohibitively expensive. However, mixed cell studies may miss expression patterns that are restricted to specific cell populations. Computational deconvolution can be used to estimate cell fractions from bulk expression data and infer average cell-type expression in a set of samples (eg cases or controls), but imputing sample-level cell-type expression is required for more detailed analyses, such as relating expression to quantitative traits, and is less commonly addressed. Here, we assessed the accuracy of imputing sample-level cell-type expression using a real dataset where mixed peripheral blood mononuclear cells (PBMC) and sorted (CD4, CD8, CD14, CD19) RNA sequencing data were generated from the same subjects (N=158), and pseudobulk datasets synthesised from eQTLgen single cell RNA-seq data. We compared three domain-specific methods, CIBERSORTx, bMIND and debCAM/swCAM, and two cross-domain machine learning methods, multiple response LASSO and ridge, that had not been used for this task before. We also assessed the methods according to their ability to recover differential gene expression (DGE) results. LASSO/ridge showed higher sensitivity but lower specificity for recovering DGE signals seen in observed data compared to deconvolution methods, although LASSO/ridge had higher area under curves than deconvolution methods. Machine learning methods have the potential to outperform domain-specific methods when suitable training data are available.Competing Interest StatementThe CLUSTER consortium has been provided with generous grants from AbbVie and Sobi. CW receives funding from MSD and GSK and is a part-time employee of GSK. These companies had no involvement in the work presented here.Footnotes* Add contributing authors. Add Figure 1 to illustrate the application of the multi-response LASSO/ridge model for predicting sample-level cell-type expression. Add benchmarking results based on simulated dichotomous and continuous phenotypes, with/without sex as a covariate in the DGE models and update the original Figure 4 (now Figure 5) accordingly. Include benchmarking results based on the pseudobulk data in Figure 6. Add four Sup Figures (S3, S6, S8, S9) to summarise additional results Arrange Sup Figures in the order they appear in the content Add Sup Tables to summarise the existing methods (S1 Table) and computing usage (S2 & S3 Table) 
653 |a Machine learning 
653 |a Ribonucleic acid--RNA 
653 |a Gene expression 
653 |a CD19 antigen 
653 |a Observational learning 
653 |a Phenotypes 
653 |a Quorum sensing 
653 |a CD14 antigen 
653 |a Blood levels 
653 |a Peripheral blood mononuclear cells 
653 |a Leukocytes (mononuclear) 
653 |a CD4 antigen 
653 |a Population studies 
653 |a CD8 antigen 
653 |a Learning algorithms 
700 1 |a Kartawinata, Melissa 
700 1 |a Bethany Rose Jebson 
700 1 |a Restuadi, Restuadi 
700 1 |a Peckham, Hannah 
700 1 |a Radziszewska, Anna 
700 1 |a Deakin, Claire 
700 1 |a Ciurtin, Coziana 
700 1 |a Consortium, Cluster 
700 1 |a Wedderburn, Lucy R 
700 1 |a Wallace, Chris 
773 0 |t bioRxiv  |g (Jan 24, 2025) 
786 0 |d ProQuest  |t Biological Science Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3159548383/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3159548383/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u https://www.biorxiv.org/content/10.1101/2023.09.11.556650v4