Sample Efficient On-Line Learning of Optimal Dialogue Policies with Kalman Temporal Differences

Pietquin, Olivier; Geist, Matthieu; Chandramohan, Senthilkumar

doi:10.5591/978-1-57735-516-8/IJCAI11-314

Sample Efficient On-Line Learning of Optimal Dialogue Policies with Kalman Temporal Differences

Olivier Pietquin, Matthieu Geist, Senthilkumar Chandramohan

IJCAI 2011 pp. 1878-1883

doi:10.5591/978-1-57735-516-8/IJCAI11-314 /ijcai/2011/pietquin2011ijcai-sample/

Abstract

Designing dialog policies for voice-enabled interfaces is a tailoring job that is most often left to natural language processing experts. This job is generally redone for every new dialog task because cross-domain transfer is not possible. For this reason, machine learning methods for dialog policy optimization have been investigated during the last 15 years. Especially, reinforcement learning (RL) is now part of the state of the art in this domain. Standard RL methods require to test more or less random changes in the policy on users to assess them as improvements or degradations. This is called on policy learning. Nevertheless, it can result in system behaviors that are not acceptable by users. Learning algorithms should ideally infer an optimal strategy by observing interactions generated by a non-optimal but acceptable strategy, that is learning off-policy. In this contribution, a sample-efficient, online and off-policy reinforcement learning algorithm is proposed to learn an optimal policy from few hundreds of dialogues generated with a very simple handcrafted policy.

PDF IJCAI Semantic Scholar

Cite

Text

Pietquin et al. "Sample Efficient On-Line Learning of Optimal Dialogue Policies with Kalman Temporal Differences." International Joint Conference on Artificial Intelligence, 2011. doi:10.5591/978-1-57735-516-8/IJCAI11-314

Markdown

[Pietquin et al. "Sample Efficient On-Line Learning of Optimal Dialogue Policies with Kalman Temporal Differences." International Joint Conference on Artificial Intelligence, 2011.](https://mlanthology.org/ijcai/2011/pietquin2011ijcai-sample/) doi:10.5591/978-1-57735-516-8/IJCAI11-314

BibTeX

@inproceedings{pietquin2011ijcai-sample,
  title     = {{Sample Efficient On-Line Learning of Optimal Dialogue Policies with Kalman Temporal Differences}},
  author    = {Pietquin, Olivier and Geist, Matthieu and Chandramohan, Senthilkumar},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2011},
  pages     = {1878-1883},
  doi       = {10.5591/978-1-57735-516-8/IJCAI11-314},
  url       = {https://mlanthology.org/ijcai/2011/pietquin2011ijcai-sample/}
}