Recent Advances in Direct Speech-to-Text Translation
Abstract
Recently, speech-to-text translation has attracted more and more attention and many studies have emerged rapidly. In this paper, we present a comprehensive survey on direct speech translation aiming to summarize the current state-of-the-art techniques. First, we categorize the existing research work into three directions based on the main challenges --- modeling burden, data scarcity, and application issues. To tackle the problem of modeling burden, two main structures have been proposed, encoder-decoder framework (Transformer and the variants) and multitask frameworks. For the challenge of data scarcity, recent work resorts to many sophisticated techniques, such as data augmentation, pre-training, knowledge distillation, and multilingual modeling. We analyze and summarize the application issues, which include real-time, segmentation, named entity, gender bias, and code-switching. Finally, we discuss some promising directions for future work.
Cite
Text
Xu et al. "Recent Advances in Direct Speech-to-Text Translation." International Joint Conference on Artificial Intelligence, 2023. doi:10.24963/IJCAI.2023/761Markdown
[Xu et al. "Recent Advances in Direct Speech-to-Text Translation." International Joint Conference on Artificial Intelligence, 2023.](https://mlanthology.org/ijcai/2023/xu2023ijcai-recent/) doi:10.24963/IJCAI.2023/761BibTeX
@inproceedings{xu2023ijcai-recent,
title = {{Recent Advances in Direct Speech-to-Text Translation}},
author = {Xu, Chen and Ye, Rong and Dong, Qianqian and Zhao, Chengqi and Ko, Tom and Wang, Mingxuan and Xiao, Tong and Zhu, Jingbo},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2023},
pages = {6796-6804},
doi = {10.24963/IJCAI.2023/761},
url = {https://mlanthology.org/ijcai/2023/xu2023ijcai-recent/}
}