Freeze-Omni: A Smart and Low Latency Speech-to-Speech Dialogue Model with Frozen LLM

Abstract

The GPT-4o’s excellent duplex speech interaction ability has given users an impressive experience. Researchers have recently proposed several multimodal LLMs to achieve user-agent speech-to-speech conversations. In this paper, we propose a novel speech-text multimodal LLM architecture called Freeze-Omni, and our main contribution is that the speech input and output modalities can be easily connected to a textual LLM while keeping the LLM’s parameters frozen throughout the training process. We effectively ensure that the intelligence of the Freeze-Omni in the speech modality is at the same level as that in the text modality of its backbone LLM while achieving low latency in the end-to-end spoken response. In addition, we also designed a method to achieve duplex dialogue ability through multitask training, giving Freeze-Omni a more natural style of dialogue ability between users and agents. In summary, Freeze-Omni holds great potential to conduct speech-to-speech dialogue based on a multimodal LLM under the condition of a frozen LLM, avoiding the catastrophic forgetting problem caused by limited data and training resources.

Cite

Text

Wang et al. "Freeze-Omni: A Smart and Low Latency Speech-to-Speech Dialogue Model with Frozen LLM." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Wang et al. "Freeze-Omni: A Smart and Low Latency Speech-to-Speech Dialogue Model with Frozen LLM." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/wang2025icml-freezeomni/)

BibTeX

@inproceedings{wang2025icml-freezeomni,
  title     = {{Freeze-Omni: A Smart and Low Latency Speech-to-Speech Dialogue Model with Frozen LLM}},
  author    = {Wang, Xiong and Li, Yangze and Fu, Chaoyou and Zhang, Yike and Shen, Yunhang and Xie, Lei and Li, Ke and Sun, Xing and Ma, Long},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {63345-63354},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/wang2025icml-freezeomni/}
}