Evaluation of Distributed Statistical Learning

Guardado en:

Detalles Bibliográficos
Publicado en:	PQDT - Global (2025)
Autor principal:	Ávila, André Ismael Ferraz
Publicado:	ProQuest Dissertations & Theses
Materias:	User interface Simulation Artificial intelligence Privacy General Data Protection Regulation Peers Reproducibility Computer science
Acceso en línea:	Citation/Abstract Full Text - PDF
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Descripción
Resumen:	Natural Language Processing models have gained significant attention due to the development of large-scale models such as OpenAI’s GPT. These models rely on extensive and diverse datasets, which presents data-sharing challenges such as privacy and ownership.Federated Learning addresses those challenges by allowing multiple actors to collaboratively train a shared model without exchanging the raw data. Not sharing the raw data enables the use of data that, in normal conditions, could not be shared due to privacy, ethical or legal concerns. This decentralised approach minimises the central storage requirements while also lessening data privacy risks.Statistical learning algorithms are commonly used in Federated Learning. However, they must be adapted into distributed statistical learning algorithms in order to handle decentralised data. These distributed algorithms are being developed and, therefore, must obtain empirical results to assess their theoretical foundations. Due to the distributed nature of the algorithms, performing an empirical evaluation is a complex task, as the environments these algorithms operate in, and consequently, the adversities they encounter are difficult to replicate physically and consistently. This dissertation aims to support the development, improvement and analysis of distributed statistical learning algorithms by introducing an evaluation framework implemented as a discrete event simulator. The existent discrete-event simulators are compared and analysed with the evaluation of the target algorithms in mind. Then, a simulator is designed and purpose-built to be extensible, configurable and observable. The developed simulator is validated by comparing its functioning to that of an already established simulator, and its metrics visualisation capabilities are demonstrated. Furthermore, the simulator is used to evaluate a distributed statistical learning algorithm. Based on the evaluation results, a solution is proposed to address the algorithm’s identified functional shortcomings. The proposed solution is also evaluated using the designed simulator, and its results are compared to those of the original implementation.
ISBN:	9798270220884
Fuente:	ProQuest Dissertations & Theses Global