A Cluster-Based Filtering Approach to SCADA Data Preprocessing for Wind Turbine Condition Monitoring and Fault Detection

Guardado en:
Detalles Bibliográficos
Publicado en:Energies vol. 18, no. 22 (2025), p. 5954-5975
Autor principal: Kijanowski Krzysztof
Otros Autores: Barszcz Tomasz, Dao Phong Ba
Publicado:
MDPI AG
Materias:
Acceso en línea:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3275512299
003 UK-CbPIL
022 |a 1996-1073 
024 7 |a 10.3390/en18225954  |2 doi 
035 |a 3275512299 
045 2 |b d20250101  |b d20251231 
084 |a 231459  |2 nlm 
100 1 |a Kijanowski Krzysztof 
245 1 |a A Cluster-Based Filtering Approach to SCADA Data Preprocessing for Wind Turbine Condition Monitoring and Fault Detection 
260 |b MDPI AG  |c 2025 
513 |a Journal Article 
520 3 |a The high cost of wind turbine maintenance has intensified the need for reliable fault detection and condition monitoring methods. While Supervisory Control and Data Acquisition (SCADA) systems provide valuable operational data, the raw signals often contain noise, outliers, and missing or redundant entries, which can compromise analysis accuracy. This study presents a novel cluster-based outlier removal approach for SCADA data preprocessing, featuring a unique flexibility to include or exclude negative power values—a factor rarely investigated but potentially critical for fault detection performance. The method applies the K-Means++ unsupervised clustering algorithm to group data points along the wind speed–power curve. The number of clusters is determined heuristically using the elbow method, while outliers are identified through Mahalanobis distance with thresholds derived from Chebyshev’s inequality theorem. The approach was validated using SCADA data from a wind farm in Portugal and further assessed with a CUSUM test-based structural change detection method to study how preprocessing choices—outlier thresholds (5% vs. 1%) and inclusion/exclusion of negative power values—affect early fault identification. Results demonstrate reliable fault detection up to 14 days before failure, retaining over 99% of the original dataset. This work provides key insights into preprocessing impacts on model reliability and offers an open-source Python implementation for reproducibility. 
653 |a Turbines 
653 |a Machine learning 
653 |a Nuclear energy 
653 |a Software 
653 |a Accuracy 
653 |a Failure 
653 |a Wind power 
653 |a Hypothesis testing 
653 |a Electricity 
653 |a Fault diagnosis 
653 |a Costs 
653 |a Wind farms 
653 |a Statistical process control 
653 |a Neural networks 
653 |a Sensors 
653 |a Control charts 
653 |a Renewable resources 
653 |a Alternative energy sources 
653 |a Energy resources 
653 |a Nuclear power plants 
653 |a Statistical methods 
653 |a Hydroelectric power 
653 |a Statistical analysis 
700 1 |a Barszcz Tomasz 
700 1 |a Dao Phong Ba 
773 0 |t Energies  |g vol. 18, no. 22 (2025), p. 5954-5975 
786 0 |d ProQuest  |t Publicly Available Content Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3275512299/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text + Graphics  |u https://www.proquest.com/docview/3275512299/fulltextwithgraphics/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3275512299/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch