T2V2: A Unified Non-Autoregressive Model for Speech Recognition and Synthesis via Multitask Learning
Abstract
We introduce T2V2 (**T**ext to **V**oice and **V**oice to **T**ext), a unified non-autoregressive model capable of performing both automatic speech recognition (ASR) and text-to-speech (TTS) synthesis within the same framework. T2V2 uses a shared Conformer backbone with rotary positional embeddings to efficiently handle these core tasks, with ASR trained using Connectionist Temporal Classification (CTC) loss and TTS using masked language modeling (MLM) loss. The model operates on discrete tokens, where speech tokens are generated by clustering features from a self-supervised learning model. To further enhance performance, we introduce auxiliary tasks: CTC error correction to refine raw ASR outputs using contextual information from speech embeddings, and unconditional speech MLM, enabling classifier free guidance to improve TTS. Our method is self-contained, leveraging intermediate CTC outputs to align text and speech using Monotonic Alignment Search, without relying on external aligners. We perform extensive experimental evaluation to verify the efficacy of the T2V2 framework, achieving state-of-the-art performance on TTS task and competitive performance in discrete ASR.
Cite
Text
Goswami et al. "T2V2: A Unified Non-Autoregressive Model for Speech Recognition and Synthesis via Multitask Learning." International Conference on Learning Representations, 2025.Markdown
[Goswami et al. "T2V2: A Unified Non-Autoregressive Model for Speech Recognition and Synthesis via Multitask Learning." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/goswami2025iclr-t2v2/)BibTeX
@inproceedings{goswami2025iclr-t2v2,
title = {{T2V2: A Unified Non-Autoregressive Model for Speech Recognition and Synthesis via Multitask Learning}},
author = {Goswami, Nabarun and Wang, Hanqin and Harada, Tatsuya},
booktitle = {International Conference on Learning Representations},
year = {2025},
url = {https://mlanthology.org/iclr/2025/goswami2025iclr-t2v2/}
}