FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates
Abstract
Neural audio codecs are foundational to speech language models. It is expected to have a low frame rate and decoupled semantic and acoustic information. A lower frame rate codec can reduce the computational cost of speech language models by shortening the sequence length. Recent studies have developed 12.5Hz low-frame-rate audio codecs, but even lower frame rate codecs remain underexplored. We find that pushing existing audio codecs to very low frame rates loses much semantic information. We suggest that low-frame-rate codecs' limitations are in both insufficient semantic decoupling and insufficient time resolution at capturing transient phonetic details. This paper introduces **FlexiCodec** to address this limitation. FlexiCodec improves semantic preservation with a **dynamic frame rate** approach and introduces a novel architecture featuring an **ASR feature-assisted dual stream** encoding and Transformer bottlenecks. With dynamic frame rates, it uses less frames at information-sparse regions through adaptively merging semantically similar frames. A dynamic frame rate also allows FlexiCodec to support inference-time **controllable frame rates** between 3Hz and 12.5Hz. Experiments on **6.25Hz, 8.3Hz and 12.5Hz** average frame rates confirm that FlexiCodec excels over baseline systems in semantic information preservation and delivers a high audio reconstruction quality. We also validate the effectiveness of FlexiCodec in language model-based TTS. Demos are available at: https://flexicodec.github.io. Code is available at: https://github.com/amphionteam/flexicodec.
Cite
Text
Li et al. "FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates." International Conference on Learning Representations, 2026.Markdown
[Li et al. "FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/li2026iclr-flexicodec/)BibTeX
@inproceedings{li2026iclr-flexicodec,
title = {{FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates}},
author = {Li, Jiaqi and Qian, Yao and Hu, Yuxuan and Zhang, Leying and Wang, Xiaofei and Lu, Heng and Thakker, Manthan and Li, Jinyu and Zhao, Sheng and Wu, Zhizheng},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/li2026iclr-flexicodec/}
}