Development of a 5-Year Risk Prediction Model for Transition From Prediabetes to Diabetes Using Machine Learning: Retrospective Cohort Study

Kaydedildi:
Detaylı Bibliyografya
Yayımlandı:Journal of Medical Internet Research vol. 27 (2025), p. e73190
Yazar: Zhang, Yongsheng
Diğer Yazarlar: Zhang, Hongyu, Wang, Dawei, Li, Na, Lv, Haoyue, Zhang, Guang
Baskı/Yayın Bilgisi:
Gunther Eysenbach MD MPH, Associate Professor
Konular:
Online Erişim:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Etiketler: Etiketle
Etiket eklenmemiş, İlk siz ekleyin!

MARC

LEADER 00000nab a2200000uu 4500
001 3222368852
003 UK-CbPIL
022 |a 1438-8871 
024 7 |a 10.2196/73190  |2 doi 
035 |a 3222368852 
045 2 |b d20250101  |b d20251231 
100 1 |a Zhang, Yongsheng 
245 1 |a Development of a 5-Year Risk Prediction Model for Transition From Prediabetes to Diabetes Using Machine Learning: Retrospective Cohort Study 
260 |b Gunther Eysenbach MD MPH, Associate Professor  |c 2025 
513 |a Journal Article 
520 3 |a Background:Diabetes has emerged as a critical global public health crisis. Prediabetes, as the transitional phase with 5%-10% annual progression to diabetes, offers a critical window for intervention. The lack of a 5-year risk prediction model for diabetes progression among Chinese individuals with prediabetes limits clinical decision-making support.Objective:This study aimed to develop and validate a machine learning–based 5-year risk prediction model of progression from prediabetes to diabetes for the Chinese population and establish an interactive web-based platform to facilitate high-risk patients identifying and early targeted interventions, ultimately reducing diabetes incidence and health care burdens.Methods:A retrospective cohort study was conducted on 2 prediabetes cohorts from 2 Chinese medical centers (primary cohort: n=6578 and external validation cohort: n=2333) tracking from 2019 to 2024. Participants meeting the American Diabetes Association (ADA) criteria (prediabetes: hemoglobin A1c [HbA1c] level of 5.7%-6.4%; diabetes: HbA1c level of ≥6.5%) were identified. A total of 42 variables (demographics, physical measures, and hematologic biomarkers) were collected using standardized protocols. Patients were split into the training (70%) and test (30%) sets randomly in the primary cohort. Significant predictors were selected on the training set using recursive feature elimination methods, followed by prediction model development using 7 machine learning algorithms (logistic regression, random forest, support vector machine, multilayer perceptron, extreme gradient boosting machine, light gradient boosting machine, and categorical boosting machine [CatBoost]), optimized through grid search and 5-fold cross-validation. Model performance was assessed using the receiver operating characteristic curve, the precision-recall curves, accuracy, sensitivity, and specificity as well as multiple other metrics on both the test set and the external test set.Results:During the follow-up of 5 years, 2610 (41.6%) participants and 760 (35.2%) participants progressed from prediabetes to diabetes, with mean annual progression rates of 8.34% and 7.04% in the primary cohort and the external cohort, respectively. Using 14 features selected using the recursive feature elimination-logistic algorithm, the CatBoost model achieved optimal performance in the test set and the external test set with an area under the receiver operating characteristic curve of 0.819 and 0.807, respectively. It also showed the best discrimination performance on the accuracy, negative predictive value (NPV), and F1-scores as well as the calibration performances in both the test set and the external test set. Then the Shapley Additive Explanations (SHAP) analysis highlighted the top 6 predictors (FBG, HDL, ALT/AST, BMI, age, and MONO), enabling targeted modification of these risk factors to reduce diabetes incidence.Conclusions:We developed a 5-year risk prediction model of progression from prediabetes to diabetes for the Chinese population, with the CatBoost model showing the best predictive performance, which could effectively identify individuals at high risk of diabetes. 
653 |a Triglycerides 
653 |a Risk reduction 
653 |a Diabetes 
653 |a Neutrophils 
653 |a Regression analysis 
653 |a Public health 
653 |a Risk factors 
653 |a Discrimination 
653 |a Calibration 
653 |a High density lipoprotein 
653 |a Cholesterol 
653 |a Cohort analysis 
653 |a Elimination 
653 |a Prediction models 
653 |a Algorithms 
653 |a Machine learning 
653 |a Hemoglobin 
653 |a Health care 
653 |a Clinical decision making 
653 |a Multimedia 
653 |a Blood pressure 
653 |a Biological markers 
653 |a Body mass index 
653 |a Glucose 
653 |a Variables 
653 |a High risk 
653 |a Diabetics 
653 |a Lipoproteins 
653 |a Tracking 
653 |a Tests 
653 |a Early intervention 
653 |a Accuracy 
653 |a Creatinine 
653 |a Demography 
653 |a Risk 
653 |a Intervention 
653 |a Predictions 
653 |a Validity 
653 |a Patients 
653 |a Training 
653 |a Recursion 
653 |a Health services 
653 |a Decision making 
653 |a Body weight 
653 |a Medical decision making 
700 1 |a Zhang, Hongyu 
700 1 |a Wang, Dawei 
700 1 |a Li, Na 
700 1 |a Lv, Haoyue 
700 1 |a Zhang, Guang 
773 0 |t Journal of Medical Internet Research  |g vol. 27 (2025), p. e73190 
786 0 |d ProQuest  |t Library Science Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3222368852/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text + Graphics  |u https://www.proquest.com/docview/3222368852/fulltextwithgraphics/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3222368852/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch