MeRino: Entropy-Driven Design for Generative Language Models on IoT Devices
Abstract
Generative Large Language Models (LLMs) stand as a revolutionary advancement in the modern era of artificial intelligence (AI). However, scaling down LLMs for resource-constrained hardware, such as Internet-of-Things (IoT) devices requires non-trivial efforts and domain knowledge. In this paper, we propose a novel information-entropy framework for designing mobile-friendly generative language models. The whole design procedure involves solving a mathematical programming (MP) problem, which can be done on the CPU within minutes, making it nearly zero-cost. We evaluate our designed models, termed MeRino, across fourteen NLP downstream tasks, showing their competitive performance against the state-of-the-art autoregressive transformer models under the mobile setting. Notably, MeRino achieves similar or better performance on both language modeling and zero-shot learning tasks, compared to the 350M parameter OPT while being 4.9x faster on NVIDIA Jetson Nano with 5.5x reduction in model size.
Cite
Text
Zhao et al. "MeRino: Entropy-Driven Design for Generative Language Models on IoT Devices." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I21.34445Markdown
[Zhao et al. "MeRino: Entropy-Driven Design for Generative Language Models on IoT Devices." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/zhao2025aaai-merino/) doi:10.1609/AAAI.V39I21.34445BibTeX
@inproceedings{zhao2025aaai-merino,
title = {{MeRino: Entropy-Driven Design for Generative Language Models on IoT Devices}},
author = {Zhao, Youpeng and Lin, Ming and Tang, Huadong and Wu, Qiang and Wang, Jun},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {22840-22848},
doi = {10.1609/AAAI.V39I21.34445},
url = {https://mlanthology.org/aaai/2025/zhao2025aaai-merino/}
}