Addressing Data Imbalance in Hydrological Machine Learning: Impact of Advanced Sampling Methods on Performance and Interpretability

שמור ב:
מידע ביבליוגרפי
הוצא לאור ב:Water Resources Research vol. 61, no. 10 (Oct 1, 2025)
מחבר ראשי: Yin, Xiaoran
מחברים אחרים: Shu, Longcang, Wang, Zhe, Zhou, Long, Niu, Shuyao, Ren, Huazhun, Liu, Bo, Lu, Chengpeng
יצא לאור:
John Wiley & Sons, Inc.
נושאים:
גישה מקוונת:Citation/Abstract
Full Text
Full Text - PDF
תגים: הוספת תג
אין תגיות, היה/י הראשונ/ה לתייג את הרשומה!

MARC

LEADER 00000nab a2200000uu 4500
001 3266095095
003 UK-CbPIL
022 |a 0043-1397 
022 |a 1944-7973 
024 7 |a 10.1029/2024WR039848  |2 doi 
035 |a 3266095095 
045 0 |b d20251001 
084 |a 107315  |2 nlm 
100 1 |a Yin, Xiaoran  |u The National Key Laboratory of Water Disaster Prevention, Hohai University, Nanjing, China 
245 1 |a Addressing Data Imbalance in Hydrological Machine Learning: Impact of Advanced Sampling Methods on Performance and Interpretability 
260 |b John Wiley & Sons, Inc.  |c Oct 1, 2025 
513 |a Journal Article 
520 3 |a Data imbalance poses a severe challenge in hydrological machine learning (ML) applications by limiting model performance and interpretability, whereas solutions remain limited. This study evaluates the impact of advanced sampling methods, particularly feature space coverage sampling (FSCS), on model performance in predicting forest cover types and saturated hydraulic conductivity (Ks); mechanism underlying its efficacy; and impact on model interpretability. Using ML algorithms such as random forest (RF) and LightGBM (LGB) across various training set sizes, we demonstrated that FSCS significantly mitigates data imbalance, enhancing model accuracy, feature importance estimation, and interpretability. Two widely used hydrological data sets were analyzed: a large multiclass forest cover type data set from Roosevelt National Forest (110,393 samples) and continuous‐value data set of soil properties from the USKSAT database (18,729 samples). In total, 1,720 models were constructed and optimized, combining different sampling methods, training set sizes, and algorithms. Balanced sampling, conditioned Latin hypercube sampling, and FSCS consistently outperformed simple random sampling. Despite using smaller training sets and simpler RF models, FSCS‐trained models matched or surpassed the performance of those using larger data sets or more complex LGB models. SHAP analysis revealed that FSCS enhanced feature–target relationship clarity, emphasizing feature interactions and improving model interpretability. These findings highlight the potential of advanced sampling methods for not only addressing data imbalance but also providing more accurate prior information for model training, thereby enhancing reliability, accuracy, and interpretability in ML for hydrological applications. 
653 |a Hydrologic data 
653 |a Datasets 
653 |a Algorithms 
653 |a Sampling methods 
653 |a Soil properties 
653 |a Hydraulic conductivity 
653 |a Hydrology 
653 |a Machine learning 
653 |a Hypercubes 
653 |a Random sampling 
653 |a Performance evaluation 
653 |a Learning algorithms 
653 |a Contamination 
653 |a Accuracy 
653 |a Soil sciences 
653 |a Training 
653 |a Groundwater 
653 |a Statistical sampling 
653 |a Decision making 
653 |a National forests 
653 |a Hydraulics 
653 |a Latin hypercube sampling 
653 |a Environmental 
700 1 |a Shu, Longcang  |u The National Key Laboratory of Water Disaster Prevention, Hohai University, Nanjing, China 
700 1 |a Wang, Zhe  |u The National Key Laboratory of Water Disaster Prevention, Hohai University, Nanjing, China 
700 1 |a Zhou, Long  |u The National Key Laboratory of Water Disaster Prevention, Hohai University, Nanjing, China 
700 1 |a Niu, Shuyao  |u The National Key Laboratory of Water Disaster Prevention, Hohai University, Nanjing, China 
700 1 |a Ren, Huazhun  |u The National Key Laboratory of Water Disaster Prevention, Hohai University, Nanjing, China 
700 1 |a Liu, Bo  |u The National Key Laboratory of Water Disaster Prevention, Hohai University, Nanjing, China 
700 1 |a Lu, Chengpeng  |u The National Key Laboratory of Water Disaster Prevention, Hohai University, Nanjing, China 
773 0 |t Water Resources Research  |g vol. 61, no. 10 (Oct 1, 2025) 
786 0 |d ProQuest  |t ABI/INFORM Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3266095095/abstract/embedded/Y2VX53961LHR7RE6?source=fedsrch 
856 4 0 |3 Full Text  |u https://www.proquest.com/docview/3266095095/fulltext/embedded/Y2VX53961LHR7RE6?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3266095095/fulltextPDF/embedded/Y2VX53961LHR7RE6?source=fedsrch