From Understanding to Improving Artificial Intelligence: New Frontiers in Machine Learning Explanations

Guardado en:

Bibliografiske detaljer
Udgivet i:	ProQuest Dissertations and Theses (2025)
Hovedforfatter:	Krishna, Satyapriya
Udgivet:	ProQuest Dissertations & Theses
Fag:	Computer science Artificial intelligence Information technology
Online adgang:	Citation/Abstract Full Text - PDF
Tags:	Tilføj Tag Ingen Tags, Vær først til at tagge denne postø!

Beskrivelse
Resumen:	As machine learning systems increasingly shape outcomes in high-stakes domains, the need to understand, trust, and effectively guide their decision-making grows urgent. This dissertation advances the field of machine learning explainability, offering a cohesive framework for enabling AI systems whose underlying reasoning is transparent, resilient, and actionable. By examining three critical frontiers—explainability amidst adversarial robustness, scalable rationale generation for large language models (LLMs), and decoding LLM behavior under iterative prompting—this work illuminates how explanations can inform, protect, and empower stakeholders. The first part reveals how adversarial training, while bolstering model security, can inadvertently undermine the provision of meaningful, low-cost algorithmic recourse. This tension exposes trade-offs between securing decision boundaries and preserving explanations that help individuals improve their predicted outcomes. The second part introduces a novel approach to scaling explanations without human annotation, integrating post hoc attributions from smaller, more interpretable proxy models directly into LLM prompting. This not only reduces the need for manual rationales but also demonstrates that automatically generated explanations can actively guide complex models toward more coherent and well-founded reasoning. The final part focuses on decoding LLM behavior through iterative prompting. While one might expect repeated user-model interactions to improve understanding and truthfulness, naïve iterative prompting can paradoxically degrade factual alignment and confidence calibration. By carefully analyzing how LLMs respond to iterative queries, the dissertation uncovers new insights into model tendencies, including over-apologizing and sycophantic patterns, and develops strategies to mitigate these issues. This examination shows that how we interact with models—how we request, refine, and interpret explanations—fundamentally shapes model reliability and clarity. Collectively, these contributions emphasize that robust, scalable, and iteratively refined explanations are both feasible and vital. By reconciling adversarial defenses with user-friendly recourse, automating rationales for complex models, and decoding LLM behaviors through iterative engagement, the dissertation provides a principled path toward AI systems whose inner workings can be understood, trusted, and responsibly guided by human stakeholders.
ISBN:	9798304962827
Fuente:	ProQuest Dissertations & Theses Global