LLMPi: Optimizing LLMs for High-Throughput on Raspberry Pi

Ardakani, Mahsa; Malekar, Jinendra; Zand, Ramtin

LLMPi: Optimizing LLMs for High-Throughput on Raspberry Pi

Mahsa Ardakani, Jinendra Malekar, Ramtin Zand

CVPRW 2025 pp. 6379-6388

/cvprw/2025/ardakani2025cvprw-llmpi/

Abstract

Deploying Large Language Models (LLMs) on resource-constrained edge devices like the Raspberry Pi presents challenges in computational efficiency, power consumption, and response latency. This paper explores quantization-based optimization techniques to enable high-throughput, energy-efficient execution of LLMs on low-power embedded systems. Our approach leverages k-quantization, a Post-Training Quantization (PTQ) method designed for different bit-widths, enabling efficient 2-bit, 4-bit, 6-bit, and 8-bit weight quantization. Additionally, we employ ternary quantization using Quantization-Aware Training (QAT) for BitNet models, allowing for more effective adaptation to lower-bit representations while preserving accuracy. Our findings highlight the potential of quantized LLMs for real-time conversational AI on edge devices, paving the way for low-power, high-efficiency AI deployment in mobile and embedded applications. This study demonstrates that aggressive quantization strategies can significantly reduce energy consumption while maintaining inference quality, making LLMs practical for resource-limited environments.

PDF CVPRW Semantic Scholar

Cite

Text

Ardakani et al. "LLMPi: Optimizing LLMs for High-Throughput on Raspberry Pi." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.

Markdown

[Ardakani et al. "LLMPi: Optimizing LLMs for High-Throughput on Raspberry Pi." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.](https://mlanthology.org/cvprw/2025/ardakani2025cvprw-llmpi/)

BibTeX

@inproceedings{ardakani2025cvprw-llmpi,
  title     = {{LLMPi: Optimizing LLMs for High-Throughput on Raspberry Pi}},
  author    = {Ardakani, Mahsa and Malekar, Jinendra and Zand, Ramtin},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2025},
  pages     = {6379-6388},
  url       = {https://mlanthology.org/cvprw/2025/ardakani2025cvprw-llmpi/}
}