Merino: Entropy-driven Design for Generative Language Models on IoT Devices

Kaydedildi:
Detaylı Bibliyografya
Yayımlandı:arXiv.org (Dec 10, 2024), p. n/a
Yazar: Zhao, Youpeng
Diğer Yazarlar: Lin, Ming, Tang, Huadong, Wu, Qiang, Wang, Jun
Baskı/Yayın Bilgisi:
Cornell University Library, arXiv.org
Konular:
Online Erişim:Citation/Abstract
Full text outside of ProQuest
Etiketler: Etiketle
Etiket eklenmemiş, İlk siz ekleyin!

MARC

LEADER 00000nab a2200000uu 4500
001 2956943530
003 UK-CbPIL
022 |a 2331-8422 
035 |a 2956943530 
045 0 |b d20241210 
100 1 |a Zhao, Youpeng 
245 1 |a Merino: Entropy-driven Design for Generative Language Models on IoT Devices 
260 |b Cornell University Library, arXiv.org  |c Dec 10, 2024 
513 |a Working Paper 
520 3 |a Generative Large Language Models (LLMs) stand as a revolutionary advancement in the modern era of artificial intelligence (AI). However, scaling down LLMs for resource-constrained hardware, such as Internet-of-Things (IoT) devices requires non-trivial efforts and domain knowledge. In this paper, we propose a novel information-entropy framework for designing mobile-friendly generative language models. The whole design procedure involves solving a mathematical programming (MP) problem, which can be done on the CPU within minutes, making it nearly zero-cost. We evaluate our designed models, termed MeRino, across fourteen NLP downstream tasks, showing their competitive performance against the state-of-the-art autoregressive transformer models under the mobile setting. Notably, MeRino achieves similar or better performance on both language modeling and zero-shot learning tasks, compared to the 350M parameter OPT while being 4.9x faster on NVIDIA Jetson Nano with 5.5x reduction in model size. 
653 |a Mathematical programming 
653 |a Computing costs 
653 |a Decoders 
653 |a Entropy 
653 |a Autoregressive models 
653 |a Internet of Things 
653 |a Large language models 
653 |a Artificial intelligence 
653 |a Transformers 
700 1 |a Lin, Ming 
700 1 |a Tang, Huadong 
700 1 |a Wu, Qiang 
700 1 |a Wang, Jun 
773 0 |t arXiv.org  |g (Dec 10, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/2956943530/abstract/embedded/ITVB7CEANHELVZIZ?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2403.07921