Divergence-Guided Simultaneous Speech Translation

Abstract

To achieve high-quality translation with low latency, a Simultaneous Speech Translation (SimulST) system relies on a policy module to decide whether to translate immediately or wait for additional streaming input, along with a translation model capable of effectively handling partial speech input. Prior research has tackled these components separately, either using ``wait-k'' policies based on fixed-length segments or detected word boundaries, or dynamic policies based on different strategies (e.g., meaningful units), while employing offline models for prefix-to-prefix translation. In this paper, we propose Divergence-Guided Simultaneous Speech Translation (DiG-SST), a tightly integrated approach focusing on both translation quality and latency for streaming input. Specifically, we introduce a simple yet effective prefix-based strategy for training translation models with partial speech input, and develop an adaptive policy that makes read/write decisions for the translation model based on the expected divergence in translation distributions resulting from future input. Our experiments on multiple translation directions of the MuST-C benchmark demonstrate that our approach achieves a better trade-off between translation quality and latency compared to existing methods.

Cite

Text

Chen et al. "Divergence-Guided Simultaneous Speech Translation." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I16.29733

Markdown

[Chen et al. "Divergence-Guided Simultaneous Speech Translation." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/chen2024aaai-divergence/) doi:10.1609/AAAI.V38I16.29733

BibTeX

@inproceedings{chen2024aaai-divergence,
  title     = {{Divergence-Guided Simultaneous Speech Translation}},
  author    = {Chen, Xinjie and Fan, Kai and Luo, Wei and Zhang, Linlin and Zhao, Libo and Liu, Xinggao and Huang, Zhongqiang},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {17799-17807},
  doi       = {10.1609/AAAI.V38I16.29733},
  url       = {https://mlanthology.org/aaai/2024/chen2024aaai-divergence/}
}