Robustness to Multiplicity in the Machine Learning Pipeline

Guardat en:
Dades bibliogràfiques
Publicat a:ProQuest Dissertations and Theses (2025)
Autor principal: Meyer, Anna P.
Publicat:
ProQuest Dissertations & Theses
Matèries:
Accés en línia:Citation/Abstract
Full Text - PDF
Etiquetes: Afegir etiqueta
Sense etiquetes, Sigues el primer a etiquetar aquest registre!

MARC

LEADER 00000nab a2200000uu 4500
001 3219202014
003 UK-CbPIL
020 |a 9798280780651 
035 |a 3219202014 
045 2 |b d20250101  |b d20251231 
084 |a 66569  |2 nlm 
100 1 |a Meyer, Anna P. 
245 1 |a Robustness to Multiplicity in the Machine Learning Pipeline 
260 |b ProQuest Dissertations & Theses  |c 2025 
513 |a Dissertation/Thesis 
520 3 |a Machine learning (ML) is increasingly used as a tool to replace or aid human decision making in high stakes settings like finance, medicine, and employment. The outcomes of these models can be pivotal in individuals’ lives, determining, for instance, whether someone gets a loan, access to proper medical care, or a job. However, these models’ decisions are often not robust to multiplicity: i.e., there are often multiple models that perform similarly well in aggregate, yet give conflicting predictions for individual samples. This multiplicity can stem from any part of the ML pipeline and affects not only predictions, but also explanations and global model behavior like adherence to fairness goals.In this dissertation, we study when multiplicity occurs, how to measure and control for it, and what its implications are for the fairness of using ML models. Our goal is to be able to understand when ML models’ outputs are reliable, so that model developers, deployers, and decision subjects can interact with models in an informed way.First, we propose dataset multiplicity, i.e., that multiple datasets may be equally appropriate to use as training data, yet yield models whose predictions disagree. We analyze prediction robustness to dataset multiplicity for two common model architectures, decision trees and linear models. The results of these analyses can be used to increase confidence in model predictions if the robustness proof is successful, or to prompt caution in blindly relying on the model outcomes otherwise. Then, we study the stability of explanations under multiplicity, and in particular dataset multiplicity that can be represented as data shift. We show how to improve explanation robustness in this setting, which allows model developers and explanation recipients to be more confident that the provided explanations will remain valid over time. Finally, we perform the first study about non-expert stakeholders’ views towards how multiplicity affects the fairness of ML models and how decisions should be made in the presence of multiplicity. Our results indicate that lay stakeholders have strong feelings about how multiplicity is resolved, but that these opinions are often at odds with what the existing literature recommends. 
653 |a Computer science 
653 |a Computer engineering 
653 |a Artificial intelligence 
773 0 |t ProQuest Dissertations and Theses  |g (2025) 
786 0 |d ProQuest  |t ProQuest Dissertations & Theses Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3219202014/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3219202014/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch