Correlation-based feature selection of single cell transcriptomics data from multiple sources

Enregistré dans:
Détails bibliographiques
Publié dans:Journal of Big Data vol. 12, no. 1 (Jan 2025), p. 4
Publié:
Springer Nature B.V.
Sujets:
Accès en ligne:Citation/Abstract
Full Text - PDF
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!

MARC

LEADER 00000nab a2200000uu 4500
001 3152000877
003 UK-CbPIL
022 |a 2196-1115 
024 7 |a 10.1186/s40537-024-01051-z  |2 doi 
035 |a 3152000877 
045 2 |b d20250101  |b d20250131 
245 1 |a Correlation-based feature selection of single cell transcriptomics data from multiple sources 
260 |b Springer Nature B.V.  |c Jan 2025 
513 |a Journal Article 
520 3 |a When applying data mining or machine learning techniques to large and diverse datasets, it is often necessary to construct descriptive and predictive models. Descriptive models are used to discover relationships between the attributes of the data while predictive models identify the characteristics of the data that will be collected in the future. Bioinformatics data is high-dimensional, making it practically impossible to apply the majority of “classical” algorithms for classification and clustering. Even if the algorithms are useful, training with large multidimensional data significantly increases processing time. The algorithms specialized for working with high-dimensional data often cannot process data containing large data sets with several thousand dimensions (features). Dimension reduction methods (such as PCA) do not provide satisfactory results, and also obscure the meaning of the original attributes in the data. For the constructed models to be usable, they must fulfill the requirement of scalability, as the amount of bioinformatics data is increasing rapidly. Furthermore, the significance of individual data features can differ from source to source. This paper describes an attribute selection method for efficient classification of high-dimensional (30,698) transcriptomics data collected from different sources. The proposed method was tested with 22 classification algorithms. The classification results for the selected attribute sets are comparable to the results for the complete attribute set. 
653 |a Multidimensional data 
653 |a Algorithms 
653 |a Data mining 
653 |a Classification 
653 |a Multidimensional methods 
653 |a Machine learning 
653 |a Bioinformatics 
653 |a Prediction models 
653 |a Clustering 
653 |a Big Data 
653 |a Data processing 
653 |a Medical informatics 
653 |a Data 
653 |a Attributes 
653 |a Information retrieval 
773 0 |t Journal of Big Data  |g vol. 12, no. 1 (Jan 2025), p. 4 
786 0 |d ProQuest  |t ABI/INFORM Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3152000877/abstract/embedded/160PP4OP4BJVV2EV?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3152000877/fulltextPDF/embedded/160PP4OP4BJVV2EV?source=fedsrch