Learning Beam Search Policies via Imitation Learning

Abstract

Beam search is widely used for approximate decoding in structured prediction problems. Models often use a beam at test time but ignore its existence at train time, and therefore do not explicitly learn how to use the beam. We develop an unifying meta-algorithm for learning beam search policies using imitation learning. In our setting, the beam is part of the model and not just an artifact of approximate decoding. Our meta-algorithm captures existing learning algorithms and suggests new ones. It also lets us show novel no-regret guarantees for learning beam search policies.

Cite

Text

Negrinho et al. "Learning Beam Search Policies via Imitation Learning." Neural Information Processing Systems, 2018.

Markdown

[Negrinho et al. "Learning Beam Search Policies via Imitation Learning." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/negrinho2018neurips-learning/)

BibTeX

@inproceedings{negrinho2018neurips-learning,
  title     = {{Learning Beam Search Policies via Imitation Learning}},
  author    = {Negrinho, Renato and Gormley, Matthew and Gordon, Geoffrey J.},
  booktitle = {Neural Information Processing Systems},
  year      = {2018},
  pages     = {10652-10661},
  url       = {https://mlanthology.org/neurips/2018/negrinho2018neurips-learning/}
}