UniForm: A Reuse Attention Mechanism Optimized for Efficient Vision Transformers on Edge Devices

Shranjeno v:
Bibliografske podrobnosti
izdano v:arXiv.org (Dec 3, 2024), p. n/a
Glavni avtor: Seul-Ki Yeom
Drugi avtorji: Tae-Ho, Kim
Izdano:
Cornell University Library, arXiv.org
Teme:
Online dostop:Citation/Abstract
Full text outside of ProQuest
Oznake: Označite
Brez oznak, prvi označite!

MARC

LEADER 00000nab a2200000uu 4500
001 3140661897
003 UK-CbPIL
022 |a 2331-8422 
035 |a 3140661897 
045 0 |b d20241203 
100 1 |a Seul-Ki Yeom 
245 1 |a UniForm: A Reuse Attention Mechanism Optimized for Efficient Vision Transformers on Edge Devices 
260 |b Cornell University Library, arXiv.org  |c Dec 3, 2024 
513 |a Working Paper 
520 3 |a Transformer-based architectures have demonstrated remarkable success across various domains, but their deployment on edge devices remains challenging due to high memory and computational demands. In this paper, we introduce a novel Reuse Attention mechanism, tailored for efficient memory access and computational optimization, enabling seamless operation on resource-constrained platforms without compromising performance. Unlike traditional multi-head attention (MHA), which redundantly computes separate attention matrices for each head, Reuse Attention consolidates these computations into a shared attention matrix, significantly reducing memory overhead and computational complexity. Comprehensive experiments on ImageNet-1K and downstream tasks show that the proposed UniForm models leveraging Reuse Attention achieve state-of-the-art imagenet classification accuracy while outperforming existing attention mechanisms, such as Linear Attention and Flash Attention, in inference speed and memory scalability. Notably, UniForm-l achieves a 76.7% Top-1 accuracy on ImageNet-1K with 21.8ms inference time on edge devices like the Jetson AGX Orin, representing up to a 5x speedup over competing benchmark methods. These results demonstrate the versatility of Reuse Attention across high-performance GPUs and edge platforms, paving the way for broader real-time applications 
653 |a Accuracy 
653 |a Attention 
653 |a Memory tasks 
653 |a Memory devices 
653 |a Flash memory (computers) 
653 |a Platforms 
653 |a Real time 
653 |a Task complexity 
653 |a Inference 
700 1 |a Tae-Ho, Kim 
773 0 |t arXiv.org  |g (Dec 3, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3140661897/abstract/embedded/ZKJTFFSVAI7CB62C?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2412.02344