A data-driven approach to forest health assessment through multivariate analysis and machine learning techniques

Збережено в:
Бібліографічні деталі
Опубліковано в::BMC Plant Biology vol. 25 (2025), p. 1-17
Автор: Raja Waqar Ahmed Khan
Інші автори: Shaheen, Hamayun, Muhammad Ejaz Ul Islam Dar, Habib, Tariq, Manzoor, Muhammad, Syed Waseem Gillani, Al-Andal, Abeer, Ayoola, John Oluwafemi, Waheed, Muhammad
Опубліковано:
Springer Nature B.V.
Предмети:
Онлайн доступ:Citation/Abstract
Full Text
Full Text - PDF
Теги: Додати тег
Немає тегів, Будьте першим, хто поставить тег для цього запису!
Опис
Короткий огляд:BackgroundHimalayan forests are fragile, rich in biodiversity, and face increasing threats from anthropogenic pressures and climate change. Assessing their health is critical for sustainable forest management. This study integrated ecological indicators (tree density, size, regeneration, deforestation, slope, grazing, and erosion) with machine learning (ML) to classify forest health and identify key drivers across 37 Western Himalayan sites. Principal component analysis (PCA) reduced data dimensionality, highlighting major ecological gradients. K-means clustering was used to group forests into three distinct classes based on ecological characteristics, due to its efficiency in identifying natural patterns within multivariate data. ML models, including Decision Tree (DT), Random Forest (RF), and Support Vector Machine (SVM) were trained and validated using an 80:20 train-test split and 5-fold cross-validation.ResultsPCA revealed that elevation, disturbance, and regeneration explained 74.3% variance. Forest health varied across sites, with 10 categorized as healthy, 19 as moderate, and 8 as unhealthy. Forest regeneration was highly skewed (2.67) and leptokurtic (9.8), with few sites showing high seedling abundance, while deforestation (mean = 294 stumps ha−1) indicated uneven human impact. Among ML models, RF showed the best performance with a mean accuracy of 0.83, Kappa 0.87, and balanced accuracy 0.88. SVM followed with 0.75 accuracy, Kappa 0.70, and balanced accuracy 0.81. DT performed lowest with 0.66 accuracy and Kappa 0.45. Cross-validation confirmed RF’s highest mean accuracy (90.3%), followed by SVM (88.1%) and DT (65.1%). RF-based feature importance analysis showed tree DBH, height, regeneration rate, soil erosion, and tree density as key ecological drivers of forest health.ConclusionsThis study highlights ML-driven classification as a precise, scalable, and objective tool for large-scale forest health assessments. Conservation efforts should prioritize degraded forests through afforestation, slope stabilization, controlled grazing, erosion management, and continuous ecosystem monitoring. Future studies should integrate high-resolution remote sensing (e.g., Landsat, Sentinel-2) and climate datasets (e.g., temperature, precipitation, and drought indices) to enhance predictive capabilities and support long-term forest management planning. The findings underscore the value of data-driven approaches, establishing machine learning as an effective tool to enhance forest monitoring and support evidence-based forest conservation and management in the Himalayas.
ISSN:1471-2229
DOI:10.1186/s12870-025-06937-5
Джерело:Health & Medical Collection