Robustness to Multiplicity in the Machine Learning Pipeline

保存先:
書誌詳細
出版年:ProQuest Dissertations and Theses (2025)
第一著者: Meyer, Anna P.
出版事項:
ProQuest Dissertations & Theses
主題:
オンライン・アクセス:Citation/Abstract
Full Text - PDF
タグ: タグ追加
タグなし, このレコードへの初めてのタグを付けませんか!
その他の書誌記述
抄録:Machine learning (ML) is increasingly used as a tool to replace or aid human decision making in high stakes settings like finance, medicine, and employment. The outcomes of these models can be pivotal in individuals’ lives, determining, for instance, whether someone gets a loan, access to proper medical care, or a job. However, these models’ decisions are often not robust to multiplicity: i.e., there are often multiple models that perform similarly well in aggregate, yet give conflicting predictions for individual samples. This multiplicity can stem from any part of the ML pipeline and affects not only predictions, but also explanations and global model behavior like adherence to fairness goals.In this dissertation, we study when multiplicity occurs, how to measure and control for it, and what its implications are for the fairness of using ML models. Our goal is to be able to understand when ML models’ outputs are reliable, so that model developers, deployers, and decision subjects can interact with models in an informed way.First, we propose dataset multiplicity, i.e., that multiple datasets may be equally appropriate to use as training data, yet yield models whose predictions disagree. We analyze prediction robustness to dataset multiplicity for two common model architectures, decision trees and linear models. The results of these analyses can be used to increase confidence in model predictions if the robustness proof is successful, or to prompt caution in blindly relying on the model outcomes otherwise. Then, we study the stability of explanations under multiplicity, and in particular dataset multiplicity that can be represented as data shift. We show how to improve explanation robustness in this setting, which allows model developers and explanation recipients to be more confident that the provided explanations will remain valid over time. Finally, we perform the first study about non-expert stakeholders’ views towards how multiplicity affects the fairness of ML models and how decisions should be made in the presence of multiplicity. Our results indicate that lay stakeholders have strong feelings about how multiplicity is resolved, but that these opinions are often at odds with what the existing literature recommends.
ISBN:9798280780651
ソース:ProQuest Dissertations & Theses Global