Hardware Accelerator Design by Using RT-Level Power Optimization Techniques on FPGA for Future AI Mobile Applications

Guardado en:
Detalles Bibliográficos
Publicado en:Electronics vol. 14, no. 16 (2025), p. 3317-3329
Autor principal: Achyuth, Gundrapally
Otros Autores: Shah, Yatrik Ashish, Vemuri, Sai Manohar, Choi Kyuwon (Ken)
Publicado:
MDPI AG
Materias:
Acceso en línea:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3244013093
003 UK-CbPIL
022 |a 2079-9292 
024 7 |a 10.3390/electronics14163317  |2 doi 
035 |a 3244013093 
045 2 |b d20250101  |b d20251231 
084 |a 231458  |2 nlm 
100 1 |a Achyuth, Gundrapally 
245 1 |a Hardware Accelerator Design by Using RT-Level Power Optimization Techniques on FPGA for Future AI Mobile Applications 
260 |b MDPI AG  |c 2025 
513 |a Journal Article 
520 3 |a In resource-constrained edge environments—such as mobile devices, IoT systems, and electric vehicles—energy-efficient Convolution Neural Network (CNN) accelerators on mobile Field Programmable Gate Arrays (FPGAs) are gaining significant attention for real-time object detection tasks. This paper presents a low-power implementation of the Tiny YOLOv4 object detection model on the Xilinx ZCU104 FPGA platform by using Register Transfer Level (RTL) optimization techniques. We proposed three RTL techniques in the paper: (i) Local Explicit Clock Enable (LECE), (ii) operand isolation, and (iii) Enhanced Clock Gating (ECG). A novel low-power design of Multiply-Accumulate (MAC) operations, which is one of the main components in the AI algorithm, was proposed to eliminate redundant signal switching activities. The Tiny YOLOv4 model, trained on the COCO dataset, was quantized and compiled using the Tensil tool-chain for fixed-point inference deployment. Post-implementation evaluation using Vivado 2022.2 demonstrates around 29.4% reduction in total on-chip power. Our design supports real-time detection throughput while maintaining high accuracy, making it ideal for deployment in battery-constrained environments such as drones, surveillance systems, and autonomous vehicles. These results highlight the effectiveness of RTL-level power optimization for scalable and sustainable edge AI deployment. 
610 4 |a Xilinx Inc 
653 |a Electric vehicles 
653 |a Applications programs 
653 |a Optimization techniques 
653 |a Artificial neural networks 
653 |a Optimization 
653 |a Mobile computing 
653 |a Mathematical functions 
653 |a Power management 
653 |a Unmanned aerial vehicles 
653 |a Field programmable gate arrays 
653 |a Object recognition 
653 |a Python 
653 |a Real time 
653 |a Constraints 
653 |a Surveillance systems 
653 |a Design techniques 
653 |a Efficiency 
700 1 |a Shah, Yatrik Ashish 
700 1 |a Vemuri, Sai Manohar 
700 1 |a Choi Kyuwon (Ken) 
773 0 |t Electronics  |g vol. 14, no. 16 (2025), p. 3317-3329 
786 0 |d ProQuest  |t Advanced Technologies & Aerospace Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3244013093/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text + Graphics  |u https://www.proquest.com/docview/3244013093/fulltextwithgraphics/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3244013093/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch