Hierarchical Reinforcement Learning for Open-Domain Dialog

Abdelrhman Saleh, Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Rosalind W. Picard

AAAI 2020 pp. 8741-8748

doi:10.1609/AAAI.V34I05.6400 /aaai/2020/saleh2020aaai-hierarchical/

Abstract

Open-domain dialog generation is a challenging problem; maximum likelihood training can lead to repetitive outputs, models have difficulty tracking long-term conversational goals, and training on standard movie or online datasets may lead to the generation of inappropriate, biased, or offensive text. Reinforcement Learning (RL) is a powerful framework that could potentially address these issues, for example by allowing a dialog model to optimize for reducing toxicity and repetitiveness. However, previous approaches which apply RL to open-domain dialog generation do so at the word level, making it difficult for the model to learn proper credit assignment for long-term conversational rewards. In this paper, we propose a novel approach to hierarchical reinforcement learning (HRL), VHRL, which uses policy gradients to tune the utterance-level embedding of a variational sequence model. This hierarchical approach provides greater flexibility for learning long-term, conversational rewards. We use self-play and RL to optimize for a set of human-centered conversation metrics, and show that our approach provides significant improvements – in terms of both human evaluation and automatic metrics – over state-of-the-art dialog models, including Transformers.

PDF AAAI Semantic Scholar

Cite

Text

Saleh et al. "Hierarchical Reinforcement Learning for Open-Domain Dialog." AAAI Conference on Artificial Intelligence, 2020. doi:10.1609/AAAI.V34I05.6400

Markdown

[Saleh et al. "Hierarchical Reinforcement Learning for Open-Domain Dialog." AAAI Conference on Artificial Intelligence, 2020.](https://mlanthology.org/aaai/2020/saleh2020aaai-hierarchical/) doi:10.1609/AAAI.V34I05.6400

BibTeX

@inproceedings{saleh2020aaai-hierarchical,
  title     = {{Hierarchical Reinforcement Learning for Open-Domain Dialog}},
  author    = {Saleh, Abdelrhman and Jaques, Natasha and Ghandeharioun, Asma and Shen, Judy Hanwen and Picard, Rosalind W.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2020},
  pages     = {8741-8748},
  doi       = {10.1609/AAAI.V34I05.6400},
  url       = {https://mlanthology.org/aaai/2020/saleh2020aaai-hierarchical/}
}