A Multiplierless Architecture for Image Convolution in Memory

Kaydedildi:

Detaylı Bibliyografya
Yayımlandı:	Journal of Low Power Electronics and Applications vol. 15, no. 4 (2025), p. 63-79
Yazar:	Reuben, John
Diğer Yazarlar:	Zeller, Felix, Seiler, Benjamin, Fey Dietmar
Baskı/Yayın Bilgisi:	MDPI AG
Konular:	Multiplication CMOS Images Computer memory Machine vision Computer architecture Arithmetic Multipliers Artificial neural networks Convolution Network latency
Online Erişim:	Citation/Abstract Full Text + Graphics Full Text - PDF
Etiketler:	Etiketle Etiket eklenmemiş, İlk siz ekleyin!

Diğer Bilgiler
Özet:	Image convolution is a commonly required task in machine vision and Convolution Neural Networks (CNNs). Due to the large data movement required, image convolution can benefit greatly from in-memory computing. However, image convolution is very computationally intensive, requiring <inline-formula>(n−(k−1))2</inline-formula> Inner Product (IP) computations for convolution of a <inline-formula>n×n</inline-formula> image with a <inline-formula>k×k</inline-formula> kernel. For example, for a convolution of a 224 × 224 image with a 3 × 3 kernel, 49,284 IPs need to be computed, where each IP requires nine multiplications and eight additions. This is a major hurdle for in-memory implementation because in-memory adders and multipliers are extremely slow compared to CMOS multipliers. In this work, we revive an old technique called ‘Distributed Arithmetic’ and judiciously apply it to perform image convolution in memory without area-intensive hard-wired multipliers. Distributed arithmetic performs multiplication using shift-and-add operations, and they are implemented using CMOS circuits in the periphery of ReRAM memory. Compared to Google’s TPU, our in-memory architecture requires 56× less energy while incurring 24× more latency for convolution of a 224 × 224 image with a 3 × 3 filter.
ISSN:	2079-9268
DOI:	10.3390/jlpea15040063
Kaynak:	Advanced Technologies & Aerospace Database