Deep Reinforcement Learning based Online Scheduling Policy for Deep Neural Network Multi-Tenant Multi-Accelerator Systems

Guardado en:

Detalles Bibliográficos
Publicado en:	arXiv.org (Apr 13, 2024), p. n/a
Autor principal:	Blanco, Francesco G
Otros Autores:	Russo, Enrico, Palesi, Maurizio, Patti, Davide, Ascia, Giuseppe, Catania, Vincenzo
Publicado:	Cornell University Library, arXiv.org
Materias:	Scheduling Algorithms Deep learning System effectiveness Computer aided scheduling Machine learning Artificial neural networks Accelerators Heterogeneity
Acceso en línea:	Citation/Abstract Full text outside of ProQuest
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC


LEADER	00000nab a2200000uu 4500
001	3039629878
003	UK-CbPIL
022			\|a 2331-8422
035			\|a 3039629878
045	0		\|b d20240413
100	1		\|a Blanco, Francesco G
245	1		\|a Deep Reinforcement Learning based Online Scheduling Policy for Deep Neural Network Multi-Tenant Multi-Accelerator Systems
260			\|b Cornell University Library, arXiv.org \|c Apr 13, 2024
513			\|a Working Paper
520	3		\|a Currently, there is a growing trend of outsourcing the execution of DNNs to cloud services. For service providers, managing multi-tenancy and ensuring high-quality service delivery, particularly in meeting stringent execution time constraints, assumes paramount importance, all while endeavoring to maintain cost-effectiveness. In this context, the utilization of heterogeneous multi-accelerator systems becomes increasingly relevant. This paper presents RELMAS, a low-overhead deep reinforcement learning algorithm designed for the online scheduling of DNNs in multi-tenant environments, taking into account the dataflow heterogeneity of accelerators and memory bandwidths contentions. By doing so, service providers can employ the most efficient scheduling policy for user requests, optimizing Service-Level-Agreement (SLA) satisfaction rates and enhancing hardware utilization. The application of RELMAS to a heterogeneous multi-accelerator system composed of various instances of Simba and Eyeriss sub-accelerators resulted in up to a 173% improvement in SLA satisfaction rate compared to state-of-the-art scheduling techniques across different workload scenarios, with less than a 1.5% energy overhead.
653			\|a Scheduling
653			\|a Algorithms
653			\|a Deep learning
653			\|a System effectiveness
653			\|a Computer aided scheduling
653			\|a Machine learning
653			\|a Artificial neural networks
653			\|a Accelerators
653			\|a Heterogeneity
700	1		\|a Russo, Enrico
700	1		\|a Palesi, Maurizio
700	1		\|a Patti, Davide
700	1		\|a Ascia, Giuseppe
700	1		\|a Catania, Vincenzo
773	0		\|t arXiv.org \|g (Apr 13, 2024), p. n/a
786	0		\|d ProQuest \|t Engineering Database
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/3039629878/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch
856	4	0	\|3 Full text outside of ProQuest \|u http://arxiv.org/abs/2404.08950