Energy aware computer vision algorithm deployment on heterogeneous architectures

Guardado en:

書目詳細資料
發表在:	Discover Electronics vol. 2, no. 1 (Dec 2025), p. 42
主要作者:	Ali, Teymoor
其他作者:	Bhowmik, Deepayan, Nicol, Robert
出版:	Springer Nature B.V.
主題:	Feature extraction Central processing units > CPUs Deep learning Hardware Vision systems Communication Bandwidths Artificial neural networks Real time Task complexity Computer vision Field programmable gate arrays Energy consumption Partitioning Accelerators Scheduling Embedded systems Graphics processing units Neural networks Power management Image classification Design Algorithms Energy efficiency Run time (computers)
在線閱讀:	Citation/Abstract Full Text Full Text - PDF
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!

實物特徵
Resumen:	Computer vision algorithms, specifically convolutional neural networks (CNNs) and feature extraction algorithms, have become increasingly pervasive in many vision tasks. As algorithm complexity grows, it raises computational and memory requirements, which poses a challenge to embedded vision systems with limited resources. Heterogeneous architectures have recently gained momentum as a new path forward for energy efficiency and faster computation, as they allow for the effective utilisation of various processing units, such as Central Processing Unit (CPU), Graphics Processing Unit (GPU), and Field Programmable Gate Array (FPGA), which are tightly integrated into a single platform to enhance system performance. However, partitioning algorithms over each accelerator requires careful consideration of hardware limitations and scheduling. We propose two low-high power heterogeneous systems and a method of partitioning CNNs and a feature extraction algorithm (SIFT) onto the hardware. We benchmark feature detection and image classification algorithms on heterogeneous systems and their discrete accelerator counterparts. We demonstrate that both systems outperform FPGA/GPU-only accelerators. Experimental results show that for the SIFT algorithm, there is 18% runtime improvement over the GPU. In the case of MobilenetV2 and ResNet18 networks, the high power system achieves 17.75%/5.55% runtime and 6.25%/2.08% energy improvements respectively, against their discrete counterparts. The low-power system achieves 6.32%/16.21% runtime and 7.32%/3.27% energy savings. The results show that effective partitioning and scheduling of imaging algorithms on heterogeneous systems is a step towards better efficiency over traditional FPGA/GPU-only accelerators.
ISSN:	2948-1600
DOI:	10.1007/s44291-025-00078-7
Fuente:	Engineering Database