Development of a 5-Year Risk Prediction Model for Transition From Prediabetes to Diabetes Using Machine Learning: Retrospective Cohort Study
Guardado en:
| Publicado en: | Journal of Medical Internet Research vol. 27 (2025), p. e73190 |
|---|---|
| Autor principal: | |
| Otros Autores: | , , , , |
| Publicado: |
Gunther Eysenbach MD MPH, Associate Professor
|
| Materias: | |
| Acceso en línea: | Citation/Abstract Full Text + Graphics Full Text - PDF |
| Etiquetas: |
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
| Resumen: | Background:Diabetes has emerged as a critical global public health crisis. Prediabetes, as the transitional phase with 5%-10% annual progression to diabetes, offers a critical window for intervention. The lack of a 5-year risk prediction model for diabetes progression among Chinese individuals with prediabetes limits clinical decision-making support.Objective:This study aimed to develop and validate a machine learning–based 5-year risk prediction model of progression from prediabetes to diabetes for the Chinese population and establish an interactive web-based platform to facilitate high-risk patients identifying and early targeted interventions, ultimately reducing diabetes incidence and health care burdens.Methods:A retrospective cohort study was conducted on 2 prediabetes cohorts from 2 Chinese medical centers (primary cohort: n=6578 and external validation cohort: n=2333) tracking from 2019 to 2024. Participants meeting the American Diabetes Association (ADA) criteria (prediabetes: hemoglobin A1c [HbA1c] level of 5.7%-6.4%; diabetes: HbA1c level of ≥6.5%) were identified. A total of 42 variables (demographics, physical measures, and hematologic biomarkers) were collected using standardized protocols. Patients were split into the training (70%) and test (30%) sets randomly in the primary cohort. Significant predictors were selected on the training set using recursive feature elimination methods, followed by prediction model development using 7 machine learning algorithms (logistic regression, random forest, support vector machine, multilayer perceptron, extreme gradient boosting machine, light gradient boosting machine, and categorical boosting machine [CatBoost]), optimized through grid search and 5-fold cross-validation. Model performance was assessed using the receiver operating characteristic curve, the precision-recall curves, accuracy, sensitivity, and specificity as well as multiple other metrics on both the test set and the external test set.Results:During the follow-up of 5 years, 2610 (41.6%) participants and 760 (35.2%) participants progressed from prediabetes to diabetes, with mean annual progression rates of 8.34% and 7.04% in the primary cohort and the external cohort, respectively. Using 14 features selected using the recursive feature elimination-logistic algorithm, the CatBoost model achieved optimal performance in the test set and the external test set with an area under the receiver operating characteristic curve of 0.819 and 0.807, respectively. It also showed the best discrimination performance on the accuracy, negative predictive value (NPV), and F1-scores as well as the calibration performances in both the test set and the external test set. Then the Shapley Additive Explanations (SHAP) analysis highlighted the top 6 predictors (FBG, HDL, ALT/AST, BMI, age, and MONO), enabling targeted modification of these risk factors to reduce diabetes incidence.Conclusions:We developed a 5-year risk prediction model of progression from prediabetes to diabetes for the Chinese population, with the CatBoost model showing the best predictive performance, which could effectively identify individuals at high risk of diabetes. |
|---|---|
| ISSN: | 1438-8871 |
| DOI: | 10.2196/73190 |
| Fuente: | Library Science Database |