Mu$^2$SLAM: Multitask, Multilingual Speech and Language Models
Abstract
We present Mu$^2$SLAM, a multilingual sequence-to-sequence model pre-trained jointly on unlabeled speech, unlabeled text and supervised data spanning Automatic Speech Recognition (ASR), Automatic Speech Translation (AST) and Machine Translation (MT), in over 100 languages. By leveraging a quantized representation of speech as a target, Mu$^2$SLAM trains the speech-text models with a sequence-to-sequence masked denoising objective similar to T5 on the decoder and a masked language modeling objective (MLM) on the encoder, for both unlabeled speech and text, while utilizing the supervised tasks to improve cross-lingual and cross-modal representation alignment within the model. On CoVoST AST, Mu$^2$SLAM establishes a new state-of-the-art for models trained on public datasets, improving on xx-en translation over the previous best by 1.9 BLEU points and on en-xx translation by 1.1 BLEU points. On Voxpopuli ASR, our model matches the performance of an mSLAM model fine-tuned with an RNN-T decoder, despite using a relatively weaker Transformer decoder. On text understanding tasks, our model improves by more than 6% over mSLAM on XNLI, getting closer to the performance of mT5 models of comparable capacity on XNLI and TydiQA, paving the way towards a single model for all speech and text understanding tasks.
Cite
Text
Cheng et al. "Mu$^2$SLAM: Multitask, Multilingual Speech and Language Models." International Conference on Machine Learning, 2023.Markdown
[Cheng et al. "Mu$^2$SLAM: Multitask, Multilingual Speech and Language Models." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/cheng2023icml-mu/)BibTeX
@inproceedings{cheng2023icml-mu,
title = {{Mu$^2$SLAM: Multitask, Multilingual Speech and Language Models}},
author = {Cheng, Yong and Zhang, Yu and Johnson, Melvin and Macherey, Wolfgang and Bapna, Ankur},
booktitle = {International Conference on Machine Learning},
year = {2023},
pages = {5504-5520},
volume = {202},
url = {https://mlanthology.org/icml/2023/cheng2023icml-mu/}
}