Deep Learning Scheduling on a Field-Programmable Gate Array Cluster Using Configurable Deep Learning Accelerators

שמור ב:
מידע ביבליוגרפי
הוצא לאור ב:Information vol. 16, no. 4 (2025), p. 298
מחבר ראשי: Fang Tianyang
מחברים אחרים: Perez-Vicente, Alejandro, Johnson, Hans, Saniie Jafar
יצא לאור:
MDPI AG
נושאים:
גישה מקוונת:Citation/Abstract
Full Text + Graphics
Full Text - PDF
תגים: הוספת תג
אין תגיות, היה/י הראשונ/ה לתייג את הרשומה!
תיאור
Resumen:This paper presents the development and evaluation of a distributed system employing low-latency embedded field-programmable gate arrays (FPGAs) to optimize scheduling for deep learning (DL) workloads and to configure multiple deep learning accelerator (DLA) architectures. Aimed at advancing FPGA applications in real-time edge computing, this study focuses on achieving optimal latency for a distributed computing system. A novel methodology was adopted, using configurable hardware to examine clusters of DLAs, varying in architecture and scheduling techniques. The system demonstrated its capability to parallel-process diverse neural network (NN) models, manage compute graphs in a pipelined sequence, and allocate computational resources efficiently to intensive NN layers. We examined five configurable DLAs—Versatile Tensor Accelerator (VTA), Nvidia DLA (NVDLA), Xilinx Deep Processing Unit (DPU), Tensil Compute Unit (CU), and Pipelined Convolutional Neural Network (PipeCNN)—across two FPGA cluster types consisting of Zynq-7000 and Zynq UltraScale+ System-on-Chip (SoC) processors, respectively. Four deep neural network (DNN) workloads were tested: Scatter-Gather, AI Core Assignment, Pipeline Scheduling, and Fused Scheduling. These methods revealed an exponential decay in processing time up to 90% speedup, although deviations were noted depending on the workload and cluster configuration. This research substantiates FPGAs’ utility in adaptable, efficient DL deployment, setting a precedent for future experimental configurations and performance benchmarks.
ISSN:2078-2489
DOI:10.3390/info16040298
Fuente:Advanced Technologies & Aerospace Database