Sequence Generation with Optimal-Transport-Enhanced Reinforcement Learning

Abstract

Reinforcement learning (RL) has been widely used to aid training in language generation. This is achieved by enhancing standard maximum likelihood objectives with user-specified reward functions that encourage global semantic consistency. We propose a principled approach to address the difficulties associated with RL-based solutions, namely, high-variance gradients, uninformative rewards and brittle training. By leveraging the optimal transport distance, we introduce a regularizer that significantly alleviates the above issues. Our formulation emphasizes the preservation of semantic features, enabling end-to-end training instead of ad-hoc fine-tuning, and when combined with RL, it controls the exploration space for more efficient model updates. To validate the effectiveness of the proposed solution, we perform a comprehensive evaluation covering a wide variety of NLP tasks: machine translation, abstractive text summarization and image caption, with consistent improvements over competing solutions.

Cite

Text

Chen et al. "Sequence Generation with Optimal-Transport-Enhanced Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2020. doi:10.1609/AAAI.V34I05.6249

Markdown

[Chen et al. "Sequence Generation with Optimal-Transport-Enhanced Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2020.](https://mlanthology.org/aaai/2020/chen2020aaai-sequence/) doi:10.1609/AAAI.V34I05.6249

BibTeX

@inproceedings{chen2020aaai-sequence,
  title     = {{Sequence Generation with Optimal-Transport-Enhanced Reinforcement Learning}},
  author    = {Chen, Liqun and Bai, Ke and Tao, Chenyang and Zhang, Yizhe and Wang, Guoyin and Wang, Wenlin and Henao, Ricardo and Carin, Lawrence},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2020},
  pages     = {7512-7520},
  doi       = {10.1609/AAAI.V34I05.6249},
  url       = {https://mlanthology.org/aaai/2020/chen2020aaai-sequence/}
}