An Empirical Analysis of AutoMl Tools and Techniques with Automated Feature Engineering

Guardado en:
Detalles Bibliográficos
Publicado en:ProQuest Dissertations and Theses (2022)
Autor principal: Shi, Kevin
Publicado:
ProQuest Dissertations & Theses
Materias:
Acceso en línea:Citation/Abstract
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
Descripción
Resumen:Automated machine learning is an approach to automate the creation of machine learning pipelines and models. The ability to automatically create a machine learning pipeline would allow users without machine learning knowledge to create and use machine learning systems. Existing machine learning practitioners can also use these automated approaches to simplify the creation of machine learning systems. As with any tool, effective evaluations of AutoML tools are necessary to ensure users can select the correct tool for their machine learning task.Current evaluations of automated machine learning are performed on simple general purpose datasets, and these datasets may be unable to provide necessary comparison information depending on the machine learning task. There is also limited work on whether AutoML systems can generate comparable models to domain experts on domain-specific data. With many current AutoML approaches, only a small part of the machine learning pipeline is automated. For AutoML to replace the need for machine learning knowledge for its users, complete automation of the machine learning pipeline will be necessary. Automating the feature engineering process is the next step of automation for the many current AutoML approaches.In this thesis, we present an empirical analysis of current open-source AutoML tools for tasks within the cybersecurity domain, highlight the current weakness of AutoML tools and evaluate the performance of popular AutoML tools for cybersecurity datasets. In addition, we propose a method of augmenting existing AutoML tools with automated feature engineering and assess the impact of different generation approaches and the effect on total pipeline creation time.
ISBN:9798351477848
Fuente:ProQuest Dissertations & Theses Global