Listen, Understand and Translate: Triple Supervision Decouples End-to-End Speech-to-Text Translation

Abstract

An end-to-end speech-to-text translation (ST) takes audio in a source language and outputs the text in a target language. Existing methods are limited by the amount of parallel corpus. Can we build a system to fully utilize signals in a parallel ST corpus? We are inspired by human understanding system which is composed of auditory perception and cognitive processing. In this paper, we propose Listen-Understand-Translate, (LUT), a unified framework with triple supervision signals to decouple the end-to-end speech-to-text translation task. LUT is able to guide the acoustic encoder to extract as much information from the auditory input. In addition, LUT utilizes a pre-trained BERT model to enforce the upper encoder to produce as much semantic information as possible, without extra data. We perform experiments on a diverse set of speech translation benchmarks, including Librispeech English-French, IWSLT English-German and TED English-Chinese. Our results demonstrate LUT achieves the state-of-the-art performance, outperforming previous methods. The code is available at https://github.com/dqqcasia/st.

Cite

Text

Dong et al. "Listen, Understand and Translate: Triple Supervision Decouples End-to-End Speech-to-Text Translation." AAAI Conference on Artificial Intelligence, 2021. doi:10.1609/AAAI.V35I14.17509

Markdown

[Dong et al. "Listen, Understand and Translate: Triple Supervision Decouples End-to-End Speech-to-Text Translation." AAAI Conference on Artificial Intelligence, 2021.](https://mlanthology.org/aaai/2021/dong2021aaai-listen/) doi:10.1609/AAAI.V35I14.17509

BibTeX

@inproceedings{dong2021aaai-listen,
  title     = {{Listen, Understand and Translate: Triple Supervision Decouples End-to-End Speech-to-Text Translation}},
  author    = {Dong, Qianqian and Ye, Rong and Wang, Mingxuan and Zhou, Hao and Xu, Shuang and Xu, Bo and Li, Lei},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2021},
  pages     = {12749-12759},
  doi       = {10.1609/AAAI.V35I14.17509},
  url       = {https://mlanthology.org/aaai/2021/dong2021aaai-listen/}
}