Trustworthy Reuse in the Machine Learning Model Supply Chain

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ProQuest Dissertations and Theses (2025)
1. Verfasser: Jiang, Wenxin
Veröffentlicht:
ProQuest Dissertations & Theses
Schlagworte:
Online-Zugang:Citation/Abstract
Full Text - PDF
Full text outside of ProQuest
Tags: Tag hinzufügen
Keine Tags, Fügen Sie das erste Tag hinzu!

MARC

LEADER 00000nab a2200000uu 4500
001 3283379482
003 UK-CbPIL
020 |a 9798265489371 
035 |a 3283379482 
045 2 |b d20250101  |b d20251231 
084 |a 66569  |2 nlm 
100 1 |a Jiang, Wenxin 
245 1 |a Trustworthy Reuse in the Machine Learning Model Supply Chain 
260 |b ProQuest Dissertations & Theses  |c 2025 
513 |a Dissertation/Thesis 
520 3 |a Machine Learning (ML) models are being adopted as components in software systems. Creating and specializing ML models from scratch has grown increasingly difficult as state-of-the-art architectures grow more complex. Following the path of traditional software engineering, ML engineers have begun to reuse pre-trained models (PTMs) and fine-tune them for downstream tasks and environments. This practice constructs the ML model supply chain. Traditional software reuse practices and challenges are well understood. However, the foundations for trustworthiness and reusability in the ML supply chain are still largely unexplored. To investigate the challenges and practices in the ML model supply chain, this dissertation conducts a series of empirical analyses, repository mining studies, and automated tool development, aiming to characterize detailed insights into the challenges and practices in PTM ecosystems. Utilizing mining software repository techniques, I have extracted, analyzed, and interpreted the rich data of deep learning reengineering process, and within PTM packages. My work first adopts traditional software engineering methodologies to understand the challenges and practices of deep learning software. I also characterized PTM naming practices and developed a Deep Neural Network (DNN) architecture assessment pipeline (DARA) to enhance trust and promote more effective reuse in the ML model supply chain. Our finding indicates that ML model naming convention is unique from traditional software packages. Building on my findings, I developed a package confusion detection system and adapted it to ML model supply chain. To enable further research, I released two open-source datasets of PTM packages. This dissertation compares the PTM model supply chain with the traditional software supply chain across multiple dimensions. The findings reveal that while the ML model supply chain shares many challenges with traditional software, it also introduces unique issues and practices. This work informs future research in ML supply chain analysis, model recommendation systems, model and dataset lineage tracking, and the automated simplification of reengineering processes. 
653 |a Machine learning 
653 |a Supply chains 
653 |a Data analysis 
653 |a Data collection 
653 |a Software packages 
653 |a Validity 
653 |a Deep learning 
653 |a Datasets 
653 |a Public domain 
653 |a Failure analysis 
653 |a Computer vision 
653 |a Open source software 
653 |a Artificial intelligence 
653 |a Computer science 
653 |a Operations research 
773 0 |t ProQuest Dissertations and Theses  |g (2025) 
786 0 |d ProQuest  |t ProQuest Dissertations & Theses Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3283379482/abstract/embedded/H09TXR3UUZB2ISDL?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3283379482/fulltextPDF/embedded/H09TXR3UUZB2ISDL?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u https://figshare.com/articles/thesis/_b_TRUSTWORTHY_REUSE_IN_THE_MACHINE_LEARNING_MODEL_SUPPLY_CHAIN_b_/28897502