Personalized and Efficient Distributed Machine Learning

Guardado en:
Detalles Bibliográficos
Publicado en:ProQuest Dissertations and Theses (2025)
Autor principal: Ozkara, Kaan
Publicado:
ProQuest Dissertations & Theses
Materias:
Acceso en línea:Citation/Abstract
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Descripción
Resumen:Modern machine learning faces two simultaneous explosions of scale: i) vast amount of data generated at the edge, ii) number of model parameters have scaled up to billions, particularly for large language models (LLMs). Firstly, data is now generated from an ecosystem of devices whose statistical characteristics and hardware capabilities differ widely, demanding a personalized distributed learning that enables collaboration of heterogeneous clients and is tailored according to specific needs. Secondly, model capacity has grown to billions of parameters, stretching the limits of memory bandwidth and computational capabilities, which necessitates efficient training methodologies while preserving numerical stability. The goal of this thesis is to advance both fronts. For personalization, we develop a statistical framework that enables a fundamental understanding and characterization of client heterogeneity both in supervised and unsupervised learning settings. Our methodologies allow collaboration of clients that may possess data from distinct distributions, and have different resource constraints. For large scale training of LLMs, we propose novel meta-optimizers that speeds up convergence, and lower precision training strategies that enable training with lower memory and faster throughput. The contributions of this thesis can be summarized as follows:We introduce QuPeD, Quantized Personalization via Distillation, where we introduce a relaxed quantization objective coupled with knowledge distillation thats lets each client learn a model compressed to its own precision and architectural constraints while enabling collaboration. We develop an alternating proximal gradient algorithm that outperforms prior personalized FL baselines across diverse vision and language tasks. The convergence of our method reveals the coupling between heterogeneity and resource constraints.We generalize the hierarchical Bayes framework to personalized FL. We unify disparate personalization techniques—regularization, model interpolation, clustering—under one statistical lens, which also yields AdaPeD, an information geometry regularized algorithm. Extending the framework, we provide user level differential privacy guarantees without sacrificing personalized performance. Furthermore, we utilize a similar framework for unsupervised learning; viewing client models through a hierarchical prior, we design adaptive algorithms for personalized dimensionality reduction and diffusion based generation. Our analysis reveals the provable utility obtained from collaboration for generative models, and experiments confirm these gains. For the personalized diffusion generative models we further propose an architectural personalization, using a shared backbone and individual client identities to steer the shared backbone.For LLM training, we develop MADA, which parameterizes a spectrum of adaptive optimizer rules (Adam, AMSGrad, etc.), and employ hyper gradient descent to select the best optimizer interpolation adaptively online. MADA consistently outperforms popular optimizers on pre training/fine tuning and vision tasks, and our theory shows how optimizer interpolation tightens convergence bounds.We propose using stochastic rounding (SR) for Adam optimizer updates to enable a full BF16 training recipe. Theoretically, we show SR results in implicit regularization of the loss function and faster convergence compared to deterministic rounding approaches. Empirically, our recipe yields higher throughput and lower memory while obtaining better validation perplexity compared to state-of-the-art mixed precision training.
ISBN:9798280756014
Fuente:ProQuest Dissertations & Theses Global