A Multiplierless Architecture for Image Convolution in Memory

Guardado en:
Detalles Bibliográficos
Publicado en:Journal of Low Power Electronics and Applications vol. 15, no. 4 (2025), p. 63-79
Autor principal: Reuben, John
Otros Autores: Zeller, Felix, Seiler, Benjamin, Fey Dietmar
Publicado:
MDPI AG
Materias:
Acceso en línea:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3286310447
003 UK-CbPIL
022 |a 2079-9268 
024 7 |a 10.3390/jlpea15040063  |2 doi 
035 |a 3286310447 
045 2 |b d20251001  |b d20251231 
084 |a 231478  |2 nlm 
100 1 |a Reuben, John 
245 1 |a A Multiplierless Architecture for Image Convolution in Memory 
260 |b MDPI AG  |c 2025 
513 |a Journal Article 
520 3 |a Image convolution is a commonly required task in machine vision and Convolution Neural Networks (CNNs). Due to the large data movement required, image convolution can benefit greatly from in-memory computing. However, image convolution is very computationally intensive, requiring <inline-formula>(n−(k−1))2</inline-formula> Inner Product (IP) computations for convolution of a <inline-formula>n×n</inline-formula> image with a <inline-formula>k×k</inline-formula> kernel. For example, for a convolution of a 224 × 224 image with a 3 × 3 kernel, 49,284 IPs need to be computed, where each IP requires nine multiplications and eight additions. This is a major hurdle for in-memory implementation because in-memory adders and multipliers are extremely slow compared to CMOS multipliers. In this work, we revive an old technique called ‘Distributed Arithmetic’ and judiciously apply it to perform image convolution in memory without area-intensive hard-wired multipliers. Distributed arithmetic performs multiplication using shift-and-add operations, and they are implemented using CMOS circuits in the periphery of ReRAM memory. Compared to Google’s TPU, our in-memory architecture requires 56× less energy while incurring 24× more latency for convolution of a 224 × 224 image with a 3 × 3 filter. 
653 |a Multiplication 
653 |a CMOS 
653 |a Images 
653 |a Computer memory 
653 |a Machine vision 
653 |a Computer architecture 
653 |a Arithmetic 
653 |a Multipliers 
653 |a Artificial neural networks 
653 |a Convolution 
653 |a Network latency 
700 1 |a Zeller, Felix 
700 1 |a Seiler, Benjamin 
700 1 |a Fey Dietmar 
773 0 |t Journal of Low Power Electronics and Applications  |g vol. 15, no. 4 (2025), p. 63-79 
786 0 |d ProQuest  |t Advanced Technologies & Aerospace Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3286310447/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text + Graphics  |u https://www.proquest.com/docview/3286310447/fulltextwithgraphics/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3286310447/fulltextPDF/embedded/L8HZQI7Z43R0LA5T?source=fedsrch