Comparison of Off-the-Shelf Methods and a Hotelling Multidimensional Approximation for Data Drift Detection

Gardado en:
Detalles Bibliográficos
Publicado en:Machine Learning and Knowledge Extraction vol. 7, no. 1 (2025), p. 2
Autor Principal: Navarro-Cerdán, J Ramón
Outros autores: Vicent Ortiz Castelló, David Millán Escrivá
Publicado:
MDPI AG
Materias:
Acceso en liña:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Etiquetas: Engadir etiqueta
Sen Etiquetas, Sexa o primeiro en etiquetar este rexistro!

MARC

LEADER 00000nab a2200000uu 4500
001 3181640152
003 UK-CbPIL
022 |a 2504-4990 
024 7 |a 10.3390/make7010002  |2 doi 
035 |a 3181640152 
045 2 |b d20250101  |b d20250331 
100 1 |a Navarro-Cerdán, J Ramón 
245 1 |a Comparison of Off-the-Shelf Methods and a Hotelling Multidimensional Approximation for Data Drift Detection 
260 |b MDPI AG  |c 2025 
513 |a Journal Article 
520 3 |a Data drift can significantly impact the outcome of a model. Early detection of data drift is crucial for ensuring user confidence in predictions. It allows the user to check if a particular model needs retraining using updated data to adapt to the evolving process dynamics. This study compares five different statistical tests, namely four unidimensional and a new multidimensional test (MSPC), to identify data drift in both mean and deviation. While some are designed to detect drift in mean only, like our multidimensional proposal, others respond to changes in both mean and deviation. However, our Hotelling multidimensional method can be trained once and then applied in a single stage to any data stream with several attributes, and it can identify the most relevant variables causing a data drift with one execution, thus avoiding the need for a single univariate test for each attribute. Moreover, our method yields the relative importance of each attribute for drift and allows users to increase or decrease the relative weight of each variable regarding drift detection. It also may be capable of detecting drift due to changes in multivariate interactions. This behavior is especially suitable for real-world scenarios, such as industry, finance, or healthcare environments. 
653 |a Deviation 
653 |a Machine learning 
653 |a Data transmission 
653 |a Methods 
653 |a Datasets 
653 |a Hypothesis testing 
653 |a Algorithms 
653 |a Multidimensional methods 
653 |a Drift 
653 |a Hypotheses 
653 |a Statistical tests 
653 |a Decision making 
653 |a Process controls 
700 1 |a Vicent Ortiz Castelló 
700 1 |a David Millán Escrivá 
773 0 |t Machine Learning and Knowledge Extraction  |g vol. 7, no. 1 (2025), p. 2 
786 0 |d ProQuest  |t Advanced Technologies & Aerospace Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3181640152/abstract/embedded/09EF48XIB41FVQI7?source=fedsrch 
856 4 0 |3 Full Text + Graphics  |u https://www.proquest.com/docview/3181640152/fulltextwithgraphics/embedded/09EF48XIB41FVQI7?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3181640152/fulltextPDF/embedded/09EF48XIB41FVQI7?source=fedsrch