LLaVA-FA: Learning Fourier Approximation for Compressing Large Multimodal Models

Zheng, Pengcheng; Zhang, Chaoning; Mo, Jiarong; Li, GuoHui; Zhang, Jiaquan; Zhang, Jiahao; Cao, Sihan; Zheng, Sheng; Qin, Caiyan; Wang, Guoqing; Yang, Yang

LLaVA-FA: Learning Fourier Approximation for Compressing Large Multimodal Models

Pengcheng Zheng, Chaoning Zhang, Jiarong Mo, GuoHui Li, Jiaquan Zhang, Jiahao Zhang, Sihan Cao, Sheng Zheng, Caiyan Qin, Guoqing Wang, Yang Yang

ICLR 2026

/iclr/2026/zheng2026iclr-llavafa/

Abstract

Large multimodal models (LMMs) have achieved impressive performance on various vision-language tasks, but their substantial computational and memory costs hinder their practical deployment. Existing compression methods often decouple low-rank decomposition and quantization, leading to compounded reconstruction errors, especially in multimodal architectures with cross-modal redundancy. To address this issue, we propose LLaVA-FA, a novel efficient LMM that performs joint low-rank plus quantization approximation in the frequency domain. By leveraging the de-correlation and conjugate symmetry properties of Fourier transform, LLaVA-FA achieves more compact and accurate weight representations. Furthermore, we introduce PolarQuant, a polar-coordinate quantization method tailored for complex matrices, and an optional diagonal calibration (ODC) scheme that eliminates the need for large-scale calibration data. Extensive experimental results demonstrate that our proposed LLaVA-FA outperforms existing efficient multimodal models across multiple benchmarks while maintaining minimal activated parameters and low computational costs, validating its effectiveness as a powerful solution for compressing LMMs.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Zheng et al. "LLaVA-FA: Learning Fourier Approximation for Compressing Large Multimodal Models." International Conference on Learning Representations, 2026.

Markdown

[Zheng et al. "LLaVA-FA: Learning Fourier Approximation for Compressing Large Multimodal Models." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zheng2026iclr-llavafa/)

BibTeX

@inproceedings{zheng2026iclr-llavafa,
  title     = {{LLaVA-FA: Learning Fourier Approximation for Compressing Large Multimodal Models}},
  author    = {Zheng, Pengcheng and Zhang, Chaoning and Mo, Jiarong and Li, GuoHui and Zhang, Jiaquan and Zhang, Jiahao and Cao, Sihan and Zheng, Sheng and Qin, Caiyan and Wang, Guoqing and Yang, Yang},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zheng2026iclr-llavafa/}
}