Bridging the Domain Gap: Improve Informal Language Translation via Counterfactual Domain Adaptation
Abstract
Despite the near-human performances already achieved on formal texts such as news articles, neural machine translation still has difficulty in dealing with "user-generated" texts that have diverse linguistic phenomena but lack large-scale high-quality parallel corpora. To address this problem, we propose a counterfactual domain adaptation method to better leverage both large-scale source-domain data (formal texts) and small-scale target-domain data (informal texts). Specifically, by considering effective counterfactual conditions (the concatenations of source-domain texts and the target-domain tag), we construct the counterfactual representations to fill the sparse latent space of the target domain caused by a small amount of data, that is, bridging the gap between the source-domain data and the target-domain data. Experiments on English-to-Chinese and Chinese-to-English translation tasks show that our method outperforms the base model that is trained only on the informal corpus by a large margin, and consistently surpasses different baseline methods by +1.12 ~ 4.34 BLEU points on different datasets. Furthermore, we also show that our method achieves competitive performances on cross-domain language translation on four language pairs.
Cite
Text
Wang et al. "Bridging the Domain Gap: Improve Informal Language Translation via Counterfactual Domain Adaptation." AAAI Conference on Artificial Intelligence, 2021. doi:10.1609/AAAI.V35I16.17645Markdown
[Wang et al. "Bridging the Domain Gap: Improve Informal Language Translation via Counterfactual Domain Adaptation." AAAI Conference on Artificial Intelligence, 2021.](https://mlanthology.org/aaai/2021/wang2021aaai-bridging/) doi:10.1609/AAAI.V35I16.17645BibTeX
@inproceedings{wang2021aaai-bridging,
title = {{Bridging the Domain Gap: Improve Informal Language Translation via Counterfactual Domain Adaptation}},
author = {Wang, Ke and Chen, Guandan and Huang, Zhongqiang and Wan, Xiaojun and Huang, Fei},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2021},
pages = {13970-13978},
doi = {10.1609/AAAI.V35I16.17645},
url = {https://mlanthology.org/aaai/2021/wang2021aaai-bridging/}
}