UniWav: Towards Unified Pre-Training for Speech Representation Learning and Generation

Abstract

Pre-training and representation learning have been playing an increasingly important role in modern speech processing. Nevertheless, different applications have been relying on different foundation models, since predominant pre-training techniques are either designed for discriminative tasks or generative tasks. In this work, we make the first attempt at building a unified pre-training framework for both types of tasks in speech. We show that with the appropriate design choices for pre-training, one can jointly learn a representation encoder and generative audio decoder that can be applied to both types of tasks. We propose UniWav, an encoder-decoder framework designed to unify pre-training representation learning and generative tasks. On speech recognition, text-to-speech, and speech tokenization, UniWav achieves comparable performance to different existing foundation models, each trained on a specific task. Our findings suggest that a single general-purpose foundation model for speech can be built to replace different foundation models, reducing the overhead and cost of pre-training.

Cite

Text

Liu et al. "UniWav: Towards Unified Pre-Training for Speech Representation Learning and Generation." International Conference on Learning Representations, 2025.

Markdown

[Liu et al. "UniWav: Towards Unified Pre-Training for Speech Representation Learning and Generation." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/liu2025iclr-uniwav/)

BibTeX

@inproceedings{liu2025iclr-uniwav,
  title     = {{UniWav: Towards Unified Pre-Training for Speech Representation Learning and Generation}},
  author    = {Liu, Alexander H. and Lee, Sang-gil and Yang, Chao-Han Huck and Gong, Yuan and Wang, Yu-Chiang Frank and Glass, James R. and Valle, Rafael and Catanzaro, Bryan},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/liu2025iclr-uniwav/}
}