A population spatialization method based on the integration of feature selection and an improved random forest model

Salvato in:
Dettagli Bibliografici
Pubblicato in:PLoS One vol. 20, no. 4 (Apr 2025), p. e0321263
Autore principale: Zhao, Zhen
Altri autori: Guo, Hongmei, Jiang, Xueli, Zhang, Ying, Lu, Changjiang, Zhang, Can, He, Zonghang
Pubblicazione:
Public Library of Science
Soggetti:
Accesso online:Citation/Abstract
Full Text
Full Text - PDF
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!

MARC

LEADER 00000nab a2200000uu 4500
001 3186292046
003 UK-CbPIL
022 |a 1932-6203 
024 7 |a 10.1371/journal.pone.0321263  |2 doi 
035 |a 3186292046 
045 2 |b d20250401  |b d20250430 
084 |a 174835  |2 nlm 
100 1 |a Zhao, Zhen 
245 1 |a A population spatialization method based on the integration of feature selection and an improved random forest model 
260 |b Public Library of Science  |c Apr 2025 
513 |a Journal Article 
520 3 |a Ascertaining the precise and accurate spatial distribution of population is essential in conducting effective urban planning, resource allocation, and emergency rescue planning. The random forest (RF) model is widely used in population spatialization studies. However, the complexity of population distribution characteristics and the limitations of the RF model in processing unbalanced datasets affect population prediction accuracy. To address these issues, a population spatialization model that integrates feature selection with an improved random forest is proposed herein. Firstly, recursive feature elimination using cross validation (RFECV), maximum information coefficient (MIC), and mean decrease accuracy (MDA) methods were utilized to select population distribution feature factors. The random forest was constructed using feature subsets that were selected via different feature selection methods, namely MIC-RF, RFECV-RF and MDA-RF. Subsequently, the feature factors corresponding to the model with the highest accuracy were selected as the optimal feature subsets and used in the model construction as input data. Additionally, considering the imbalanced in population spatial distribution, we used the K-means ++ clustering algorithm to cluster the optimal feature subset, and we used the bootstrap sampling method to extract the same amount of data from each cluster and fuse it with the training subset to build an improved random forest model. Based on this model, a spatial population distribution dataset of the Southern Sichuan Economic Zone at a 500m resolution was generated. Finally, the population dataset generated in this study was compared and validated with the WorldPop dataset. The results showed that utilizing feature selection methods improves model accuracy to varying degrees compared with RF based on all factors, and the MDA-RF had the lowest MAPE of 0.174 and the highest R2 of 0.913 among them. Therefore, feature factors selection using the MDA method was considered the optimal feature subset. Compared with MDA-RF, the prediction accuracy of the improved RF built on the same subset increased by 1.7%, indicating that improving the bootstrap sampling of random forest by using the K-means++ clustering algorithm can enhance model accuracy to some extent. Compared with the WorldPop dataset, the accuracy of the results predicted using the proposed method was enhanced. The MRE and RMSE of the WorldPop dataset were 57.24 and 23174.98, respectively, while the MRE and RMSE of the proposed method were 25.00 and 15776.50, respectively. This implies that the method proposed in this paper could simulate population spatial distribution more accurately. 
653 |a Resource allocation 
653 |a Accuracy 
653 |a Datasets 
653 |a Urban planning 
653 |a Algorithms 
653 |a Sampling methods 
653 |a Models 
653 |a Optimization 
653 |a Population distribution 
653 |a Data processing 
653 |a Feature selection 
653 |a Spatial distribution 
653 |a Sampling 
653 |a Clustering 
653 |a Machine learning 
653 |a Artificial intelligence 
653 |a Cardiovascular disease 
653 |a Variables 
653 |a Methods 
653 |a Rescue operations 
653 |a Spatial analysis 
653 |a Population studies 
653 |a Census of Population 
653 |a Subsets 
653 |a Allocation 
653 |a Recursion 
653 |a Elimination 
653 |a Environmental 
700 1 |a Guo, Hongmei 
700 1 |a Jiang, Xueli 
700 1 |a Zhang, Ying 
700 1 |a Lu, Changjiang 
700 1 |a Zhang, Can 
700 1 |a He, Zonghang 
773 0 |t PLoS One  |g vol. 20, no. 4 (Apr 2025), p. e0321263 
786 0 |d ProQuest  |t Health & Medical Collection 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3186292046/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text  |u https://www.proquest.com/docview/3186292046/fulltext/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3186292046/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch