Tackling Data and Resource Heterogeneity for Performance Enhancement of Collaborative Learning Systems

Guardat en:

Dades bibliogràfiques
Publicat a:	PQDT - Global (2025)
Autor principal:	Li, Ao
Publicat:	ProQuest Dissertations & Theses
Matèries:	Machine learning Deep learning Artificial intelligence Costs Communication Planning Sensors Neural networks Decomposition Design Data science Privacy Collaborative learning Energy consumption Natural language
Accés en línia:	Citation/Abstract Full Text - PDF Full text outside of ProQuest
Etiquetes:	Afegir etiqueta Sense etiquetes, Sigues el primer a etiquetar aquest registre!

Descripció
Resum:	Collaborative Learning (CL) is a decentralized machine learning framework that enables multiple clients to collaboratively solve tasks without sharing their raw data. At its early stage, the most prominent form of CL was Federated Learning (FL), which was originally introduced to leverage distributed information across clients for training a global model, with data heterogeneity and high communication costs being the primary concerns. Recently, with the rapid development of Pre-Trained Models (PTMs), there is a growing need for federated fine-tuning to efficiently adapt PTMs to downstream tasks using distributed, task-oriented datasets. However, since PTMs often encapsulate substantial proprietary knowledge, model privacy has emerged as an additional critical concern alongside data privacy. Nowadays, advancements in computational and storage capabilities have made it increasingly feasible to deploy PTMs on edge devices. In scenarios involving complex tasks that demand the integration of diverse capabilities, a pressing research challenge is how to effectively coordinate heterogeneous clients equipped with specialized PTMs to enable collaborative problem solving.Our first work, FedDAD, considers the problem of unsupervised deep anomaly detection (DAD) in an FL setting with noisy and heterogeneous data. It leverages a small public dataset on the server as a shared normal anchor in the latent space to relieve the data heterogeneity problem, improving anomaly identification capability across clients. When combined with PTMs, our second work GenFFT introduces a hybrid sharing mechanism that combines parameter sharing and knowledge sharing to protect model privacy. GenFFT suggests using a lightweight substitute model, rather than sharing the entire PTMs, during the training process, together with the generation modules that are alternatively updated by the server and clients to promote information exchange. When clients possess private models with distinct capabilities, complex tasks can be solved through model collaboration without further parameter updates, which requires the server to generate a plan that effectively coordinates their cooperation. As the server always fails to generate the optimal plan on the first attempt, we propose COP, a novel client-oriented planning framework that refines the initial plan before execution based on three specifically designed principles: solvability, completeness, and non-redundancy, thus enables the collaborative resolution of complex tasks while preserving both data and model privacy.Extensive experiments across a variety of datasets demonstrate that our proposed methods are broadly effective: whether federated training of small models from scratch, federated fine-tuning of large pre-trained models, or collaborative inference without parameter updating, each approach achieves strong performance while preserving data privacy across diverse tasks.
ISBN:	9798263313180
Font:	ProQuest Dissertations & Theses Global