Towards machine learning fairness in classifying multicategory causes of deaths in colorectal or lung cancer patients
Guardado en:
| Publicado en: | bioRxiv (Feb 19, 2025) |
|---|---|
| Autor principal: | |
| Otros Autores: | , , , |
| Publicado: |
Cold Spring Harbor Laboratory Press
|
| Materias: | |
| Acceso en línea: | Citation/Abstract Full Text - PDF Full text outside of ProQuest |
| Etiquetas: |
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
| Resumen: | Classification of patient multicategory survival outcomes is important for personalized cancer treatments. Machine Learning (ML) algorithms have increasingly been used to inform healthcare decisions, but these models are vulnerable to biases in data collection and algorithm creation. ML models have previously been shown to exhibit racial bias, but their fairness towards patients from different age and sex groups have yet to be studied. Therefore, we compared the multimetric performances of 5 ML models (random forests, multinomial logistic regression, linear support vector classifier, linear discriminant analysis, and multilayer perceptron) when classifying colorectal cancer patients (n=515) of various age, sex, and racial groups using the TCGA data. All five models exhibited biases for these sociodemographic groups. We then repeated the same process on lung adenocarcinoma (n=589) to validate our findings. Surprisingly, most models tended to perform more poorly overall for the largest sociodemographic groups. Methods to optimize model performance, including testing the model on merged age, sex, or racial groups, and creating a model trained on and used for an individual or merged sociodemographic group, show potential to reduce disparities in model performance for different groups. Notably, these methods may be used to improve ML fairness while avoiding penalizing the model for exhibiting bias and thus sacrificing overall performance.Competing Interest StatementThe authors have declared no competing interest. |
|---|---|
| ISSN: | 2692-8205 |
| DOI: | 10.1101/2025.02.14.638368 |
| Fuente: | Biological Science Database |