Learning to Search Better than Your Teacher

Abstract

Methods for learning to search for structured prediction typically imitate a reference policy, with existing theoretical guarantees demonstrating low regret compared to that reference. This is unsatisfactory in many applications where the reference policy is suboptimal and the goal of learning is to improve upon it. Can learning to search work even when the reference is poor? We provide a new learning to search algorithm, LOLS, which does well relative to the reference policy, but additionally guarantees low regret compared to deviations from the learned policy: a local-optimality guarantee. Consequently, LOLS can improve upon the reference policy, unlike previous algorithms. This enables us to develop structured contextual bandits, a partial information structured prediction setting with many potential applications.

Cite

Text

Chang et al. "Learning to Search Better than Your Teacher." International Conference on Machine Learning, 2015.

Markdown

[Chang et al. "Learning to Search Better than Your Teacher." International Conference on Machine Learning, 2015.](https://mlanthology.org/icml/2015/chang2015icml-learning/)

BibTeX

@inproceedings{chang2015icml-learning,
  title     = {{Learning to Search Better than Your Teacher}},
  author    = {Chang, Kai-Wei and Krishnamurthy, Akshay and Agarwal, Alekh and Daumé, Hal and Langford, John},
  booktitle = {International Conference on Machine Learning},
  year      = {2015},
  pages     = {2058-2066},
  volume    = {37},
  url       = {https://mlanthology.org/icml/2015/chang2015icml-learning/}
}