Way Off-Policy Batch Deep Reinforcement Learning of Human Preferences in Dialog

Cite

Text

Jaques et al. "Way Off-Policy Batch Deep Reinforcement Learning of Human Preferences in Dialog." International Conference on Learning Representations, 2020.

Markdown

[Jaques et al. "Way Off-Policy Batch Deep Reinforcement Learning of Human Preferences in Dialog." International Conference on Learning Representations, 2020.](https://mlanthology.org/iclr/2020/jaques2020iclr-way/)

BibTeX

@inproceedings{jaques2020iclr-way,
  title     = {{Way Off-Policy Batch Deep Reinforcement Learning of Human Preferences in Dialog}},
  author    = {Jaques, Natasha and Ghandeharioun, Asma and Shen, Judy Hanwen and Ferguson, Craig and Lapedriza, Agata and Jones, Noah and Gu, Shixiang and Picard, Rosalind},
  booktitle = {International Conference on Learning Representations},
  year      = {2020},
  url       = {https://mlanthology.org/iclr/2020/jaques2020iclr-way/}
}