Merino: Entropy-driven Design for Generative Language Models on IoT Devices

-д хадгалсан:

Номзүйн дэлгэрэнгүй
-д хэвлэсэн:	arXiv.org (Dec 10, 2024), p. n/a
Үндсэн зохиолч:	Zhao, Youpeng
Бусад зохиолчид:	Lin, Ming, Tang, Huadong, Wu, Qiang, Wang, Jun
Хэвлэсэн:	Cornell University Library, arXiv.org
Нөхцлүүд:	Mathematical programming Computing costs Decoders Entropy Autoregressive models Internet of Things Large language models Artificial intelligence Transformers
Онлайн хандалт:	Citation/Abstract Full text outside of ProQuest
Шошгууд:	Шошго нэмэх Шошго байхгүй, Энэхүү баримтыг шошголох эхний хүн болох!

Тодорхойлолт
Хураангуй:	Generative Large Language Models (LLMs) stand as a revolutionary advancement in the modern era of artificial intelligence (AI). However, scaling down LLMs for resource-constrained hardware, such as Internet-of-Things (IoT) devices requires non-trivial efforts and domain knowledge. In this paper, we propose a novel information-entropy framework for designing mobile-friendly generative language models. The whole design procedure involves solving a mathematical programming (MP) problem, which can be done on the CPU within minutes, making it nearly zero-cost. We evaluate our designed models, termed MeRino, across fourteen NLP downstream tasks, showing their competitive performance against the state-of-the-art autoregressive transformer models under the mobile setting. Notably, MeRino achieves similar or better performance on both language modeling and zero-shot learning tasks, compared to the 350M parameter OPT while being 4.9x faster on NVIDIA Jetson Nano with 5.5x reduction in model size.
ISSN:	2331-8422
Эх сурвалж:	Engineering Database