Exploring Bilingual Parallel Corpora for Syntactically Controllable Paraphrase Generation
Abstract
Paraphrase generation is of great importance to many downstream tasks in natural language processing. Recent efforts have focused on generating paraphrases in specific syntactic forms, which, generally, heavily relies on manually annotated paraphrase data that is not easily available for many languages and domains. In this paper, we propose a novel end-to-end framework to leverage existing large-scale bilingual parallel corpora to generate paraphrases under the control of syntactic exemplars. In order to train one model over the two languages of parallel corpora, we embed sentences of them into the same content and style spaces with shared content and style encoders using cross-lingual word embeddings. We propose an adversarial discriminator to disentangle the content and style space, and employ a latent variable to model the syntactic style of a given exemplar in order to guide the two decoders for generation. Additionally, we introduce cycle and masking learning schemes to efficiently train the model. Experiments and analyses demonstrate that the proposed model trained only on bilingual parallel data is capable of generating diverse paraphrases with desirable syntactic styles. Fine-tuning the trained model on a small paraphrase corpus makes it substantially outperform state-of-the-art paraphrase generation models trained on a larger paraphrase dataset.
Cite
Text
Liu et al. "Exploring Bilingual Parallel Corpora for Syntactically Controllable Paraphrase Generation." International Joint Conference on Artificial Intelligence, 2020. doi:10.24963/IJCAI.2020/547Markdown
[Liu et al. "Exploring Bilingual Parallel Corpora for Syntactically Controllable Paraphrase Generation." International Joint Conference on Artificial Intelligence, 2020.](https://mlanthology.org/ijcai/2020/liu2020ijcai-exploring/) doi:10.24963/IJCAI.2020/547BibTeX
@inproceedings{liu2020ijcai-exploring,
title = {{Exploring Bilingual Parallel Corpora for Syntactically Controllable Paraphrase Generation}},
author = {Liu, Mingtong and Yang, Erguang and Xiong, Deyi and Zhang, Yujie and Sheng, Chen and Hu, Changjian and Xu, Jinan and Chen, Yufeng},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2020},
pages = {3955-3961},
doi = {10.24963/IJCAI.2020/547},
url = {https://mlanthology.org/ijcai/2020/liu2020ijcai-exploring/}
}