Merino: Entropy-driven Design for Generative Language Models on IoT Devices
Kaydedildi:
| Yayımlandı: | arXiv.org (Dec 10, 2024), p. n/a |
|---|---|
| Yazar: | |
| Diğer Yazarlar: | , , , |
| Baskı/Yayın Bilgisi: |
Cornell University Library, arXiv.org
|
| Konular: | |
| Online Erişim: | Citation/Abstract Full text outside of ProQuest |
| Etiketler: |
Etiket eklenmemiş, İlk siz ekleyin!
|
MARC
| LEADER | 00000nab a2200000uu 4500 | ||
|---|---|---|---|
| 001 | 2956943530 | ||
| 003 | UK-CbPIL | ||
| 022 | |a 2331-8422 | ||
| 035 | |a 2956943530 | ||
| 045 | 0 | |b d20241210 | |
| 100 | 1 | |a Zhao, Youpeng | |
| 245 | 1 | |a Merino: Entropy-driven Design for Generative Language Models on IoT Devices | |
| 260 | |b Cornell University Library, arXiv.org |c Dec 10, 2024 | ||
| 513 | |a Working Paper | ||
| 520 | 3 | |a Generative Large Language Models (LLMs) stand as a revolutionary advancement in the modern era of artificial intelligence (AI). However, scaling down LLMs for resource-constrained hardware, such as Internet-of-Things (IoT) devices requires non-trivial efforts and domain knowledge. In this paper, we propose a novel information-entropy framework for designing mobile-friendly generative language models. The whole design procedure involves solving a mathematical programming (MP) problem, which can be done on the CPU within minutes, making it nearly zero-cost. We evaluate our designed models, termed MeRino, across fourteen NLP downstream tasks, showing their competitive performance against the state-of-the-art autoregressive transformer models under the mobile setting. Notably, MeRino achieves similar or better performance on both language modeling and zero-shot learning tasks, compared to the 350M parameter OPT while being 4.9x faster on NVIDIA Jetson Nano with 5.5x reduction in model size. | |
| 653 | |a Mathematical programming | ||
| 653 | |a Computing costs | ||
| 653 | |a Decoders | ||
| 653 | |a Entropy | ||
| 653 | |a Autoregressive models | ||
| 653 | |a Internet of Things | ||
| 653 | |a Large language models | ||
| 653 | |a Artificial intelligence | ||
| 653 | |a Transformers | ||
| 700 | 1 | |a Lin, Ming | |
| 700 | 1 | |a Tang, Huadong | |
| 700 | 1 | |a Wu, Qiang | |
| 700 | 1 | |a Wang, Jun | |
| 773 | 0 | |t arXiv.org |g (Dec 10, 2024), p. n/a | |
| 786 | 0 | |d ProQuest |t Engineering Database | |
| 856 | 4 | 1 | |3 Citation/Abstract |u https://www.proquest.com/docview/2956943530/abstract/embedded/ITVB7CEANHELVZIZ?source=fedsrch |
| 856 | 4 | 0 | |3 Full text outside of ProQuest |u http://arxiv.org/abs/2403.07921 |