Towards Reliable and Trustworthy Pipelines for MLOps and LLMOps

Guardat en:
Dades bibliogràfiques
Publicat a:ProQuest Dissertations and Theses (2025)
Autor principal: Abbassi, Altaf Allah
Publicat:
ProQuest Dissertations & Theses
Matèries:
Accés en línia:Citation/Abstract
Full Text - PDF
Etiquetes: Afegir etiqueta
Sense etiquetes, Sigues el primer a etiquetar aquest registre!
Descripció
Resum:Machine Learning (ML) models and Large Language Models (LLMs) have demonstrated strong capabilities in automating tasks traditionally performed manually. Their effectiveness has led to their integration into critical applications, forming the backbone of complex AIbased systems. To manage the full lifecycle of these models, operational frameworks such as Machine Learning Operations (MLOps) and Large Language Model Operations (LLMOps) have emerged, offering tailored tools and best practices. MLOps and LLMOps pipelines enable the continuous deployment of improved models—either by updating to more capable versions or by adapting models to evolving, dynamic environments through Continuous Training (CT) on fresh production data. These practices aim to enhance the dependability of AI-based systems. However, despite offering best practices and tools to support robustness, MLOps and LLMOps pipelines do not inherently guarantee reliability or trustworthiness. For instance, deploying a sub-optimal model may lead to performance degradation instead of improvement, ultimately compromising the reliability of the entire pipeline. Such reliability issues can lead to costly failures, loss of stakeholder trust, and critical errors in high-stakes applications.In this thesis, we aim to contribute to improving the reliability and trustworthiness of both MLOps and LLMOps pipelines. In the first part, we focus on MLOps and aim to enhance the reliability of models updated through CT workflows. Although CT is designed to improve AI-based system performance, it can also introduce risks when production data is noisy and poorly managed, leading to catastrophic regressions or silent performance degradation. In practice, production data may suffer from distribution drift or be automatically labeled with low confidence, making it unsuitable for reliable CT. To address these challenges and promote more robust CT pipelines, we propose a reliable maintenance approach based on a filtering mechanism for incoming data. This method excludes low-confidence instances, which are likely to be mislabeled, as well as samples that significantly deviate from the original distribution. This approach helps safeguard the CT process and ensures more reliable model updates over time.The second part of this thesis focuses on LLMOps, particularly pipelines for code generation tasks. With the rapid evolution of large language models, new versions are frequently released, often accompanied by promises of significant improvements. However, such updates can inadvertently introduce regressions, and even the most advanced models may produce code with inefficiencies. This makes it difficult to maintain consistent quality and long-term reliability across model versions.To address these challenges, we first propose a taxonomy of inefficiencies commonly observed in code generated by LLMs. This taxonomy provides a structured basis for systematically evaluating model outputs and identifying recurring flaws. Building on this foundation, we introduce ReCatcher, a regression testing suite designed to detect both capability regressions and improvements between different LLM versions. ReCatcher thus contributes to a more transparent, trustworthy, and well-informed continuous deployment process for language models.Together, these contributions aim to strengthen the reliability of AI-based systems by addressing key challenges in MLOps and LLMOps pipelines, while providing concrete solutions and actionable insights for safer model deployment.
ISBN:9798293882120
Font:ProQuest Dissertations & Theses Global