Hardware Accelerator Design by Using RT-Level Power Optimization Techniques on FPGA for Future AI Mobile Applications
Guardado en:
| Publicado en: | Electronics vol. 14, no. 16 (2025), p. 3317-3329 |
|---|---|
| Autor principal: | |
| Otros Autores: | , , |
| Publicado: |
MDPI AG
|
| Materias: | |
| Acceso en línea: | Citation/Abstract Full Text + Graphics Full Text - PDF |
| Etiquetas: |
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 3244013093 | ||
| 003 | UK-CbPIL | ||
| 022 | |a 2079-9292 | ||
| 024 | 7 | |a 10.3390/electronics14163317 |2 doi | |
| 035 | |a 3244013093 | ||
| 045 | 2 | |b d20250101 |b d20251231 | |
| 084 | |a 231458 |2 nlm | ||
| 100 | 1 | |a Achyuth, Gundrapally | |
| 245 | 1 | |a Hardware Accelerator Design by Using RT-Level Power Optimization Techniques on FPGA for Future AI Mobile Applications | |
| 260 | |b MDPI AG |c 2025 | ||
| 513 | |a Journal Article | ||
| 520 | 3 | |a In resource-constrained edge environments—such as mobile devices, IoT systems, and electric vehicles—energy-efficient Convolution Neural Network (CNN) accelerators on mobile Field Programmable Gate Arrays (FPGAs) are gaining significant attention for real-time object detection tasks. This paper presents a low-power implementation of the Tiny YOLOv4 object detection model on the Xilinx ZCU104 FPGA platform by using Register Transfer Level (RTL) optimization techniques. We proposed three RTL techniques in the paper: (i) Local Explicit Clock Enable (LECE), (ii) operand isolation, and (iii) Enhanced Clock Gating (ECG). A novel low-power design of Multiply-Accumulate (MAC) operations, which is one of the main components in the AI algorithm, was proposed to eliminate redundant signal switching activities. The Tiny YOLOv4 model, trained on the COCO dataset, was quantized and compiled using the Tensil tool-chain for fixed-point inference deployment. Post-implementation evaluation using Vivado 2022.2 demonstrates around 29.4% reduction in total on-chip power. Our design supports real-time detection throughput while maintaining high accuracy, making it ideal for deployment in battery-constrained environments such as drones, surveillance systems, and autonomous vehicles. These results highlight the effectiveness of RTL-level power optimization for scalable and sustainable edge AI deployment. | |
| 610 | 4 | |a Xilinx Inc | |
| 653 | |a Electric vehicles | ||
| 653 | |a Applications programs | ||
| 653 | |a Optimization techniques | ||
| 653 | |a Artificial neural networks | ||
| 653 | |a Optimization | ||
| 653 | |a Mobile computing | ||
| 653 | |a Mathematical functions | ||
| 653 | |a Power management | ||
| 653 | |a Unmanned aerial vehicles | ||
| 653 | |a Field programmable gate arrays | ||
| 653 | |a Object recognition | ||
| 653 | |a Python | ||
| 653 | |a Real time | ||
| 653 | |a Constraints | ||
| 653 | |a Surveillance systems | ||
| 653 | |a Design techniques | ||
| 653 | |a Efficiency | ||
| 700 | 1 | |a Shah, Yatrik Ashish | |
| 700 | 1 | |a Vemuri, Sai Manohar | |
| 700 | 1 | |a Choi Kyuwon (Ken) | |
| 773 | 0 | |t Electronics |g vol. 14, no. 16 (2025), p. 3317-3329 | |
| 786 | 0 | |d ProQuest |t Advanced Technologies & Aerospace Database | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/3244013093/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text + Graphics |u https://www.proquest.com/docview/3244013093/fulltextwithgraphics/embedded/L8HZQI7Z43R0LA5T?source=fedsrch |
| 856 | 4 | 0 | |3 Full Text - PDF |u https://www.proquest.com/docview/3244013093/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch |