Tiny but Mighty: A Software-Hardware Co-Design Approach for Efficient Multimodal Inference on Battery-Powered Small Devices

Abstract

Large Multimodal Models (LMMs) are inherently modular, comprising vision and audio encoders, a projector, and a language backbone. Yet existing systems execute them monolithically, underutilizing the heterogeneous accelerators (NPUs, GPUs, DSPs) on modern SoCs and inflating end-to-end latency. We present Nanomind, a hardware–software co-design inference framework that decomposes each LMM into modular "bricks"—vision, projector, language, and audio—and maps each brick to its best-suited compute units. A Token-Aware Buffer Manager (TABM) enables zero-copy embedding transfer across accelerators on unified-memory SoCs, bypassing CPU bottlenecks. Combined with customized hardware, a battery-aware scheduler, and fused low-bit GEMM kernels, Nanomind runs entirely on a compact, battery-powered prototype that operates fully offline. Nanomind reduces end-to-end energy by 42.3% against mainstream edge frameworks and devkits; in its on-demand low-power mode, the prototype runs LLaVA-OneVision-Qwen2-0.5B with a camera for nearly 18.8 hours on a single 2,000 mAh battery.

Cite

Text

Li et al. "Tiny but Mighty: A Software-Hardware Co-Design Approach for Efficient Multimodal Inference on Battery-Powered Small Devices." International Conference on Learning Representations, 2026.

Markdown

[Li et al. "Tiny but Mighty: A Software-Hardware Co-Design Approach for Efficient Multimodal Inference on Battery-Powered Small Devices." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/li2026iclr-tiny/)

BibTeX

@inproceedings{li2026iclr-tiny,
  title     = {{Tiny but Mighty: A Software-Hardware Co-Design Approach for Efficient Multimodal Inference on Battery-Powered Small Devices}},
  author    = {Li, Yilong and Zeng, Yijing and Zhang, Shuai and Zhang, Hao and Xiong, Xinmiao and Liu, Jingyu and Hu, Pan and Banerjee, Suman},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/li2026iclr-tiny/}
}