Secure Outlier-Aware Large Language Model Inference

Zhao, Lifan; Fang, Zhixuan

Secure Outlier-Aware Large Language Model Inference

ICLR 2026

/iclr/2026/zhao2026iclr-secure/

Abstract

Secure multiparty computation allows the client to secretly inference their sensitive inputs without acquiring the proprietary machine learning model weights. As the decoder-only transformer-based large language model becomes the popular paradigm, the desire of applying MPC in large language models is increasing. However, such inference usually leads to great amount of latency, which is due to nonlinear operations in the Transformer architecture. Recent works either focus on improving cryptographic primitives or re-architecting and re-training to make LLM MPC-friendly. We, on the other hand, observe that properly addressing outlier phenomena, which are unique yet universal properties existing across different LLMs, can effectively reduce the input domain and thereby design faster protocols for non-linear operations. Hence, we propose Secure Outlier-Aware Large Language Model Inference framework (SOAL), which accelerates the RMSNorm operation by nearly 2 $\times$, SiLU by $2\times$, and Softmax by more than 5$\times$. SOAL maintains the same performance of the original model without any fine-tuning requirement.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Zhao and Fang. "Secure Outlier-Aware Large Language Model Inference." International Conference on Learning Representations, 2026.

Markdown

[Zhao and Fang. "Secure Outlier-Aware Large Language Model Inference." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhao2026iclr-secure/)

BibTeX

@inproceedings{zhao2026iclr-secure,
  title     = {{Secure Outlier-Aware Large Language Model Inference}},
  author    = {Zhao, Lifan and Fang, Zhixuan},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zhao2026iclr-secure/}
}