Learning Beam Search Policies via Imitation Learning
Abstract
Beam search is widely used for approximate decoding in structured prediction problems. Models often use a beam at test time but ignore its existence at train time, and therefore do not explicitly learn how to use the beam. We develop an unifying meta-algorithm for learning beam search policies using imitation learning. In our setting, the beam is part of the model and not just an artifact of approximate decoding. Our meta-algorithm captures existing learning algorithms and suggests new ones. It also lets us show novel no-regret guarantees for learning beam search policies.
Cite
Text
Negrinho et al. "Learning Beam Search Policies via Imitation Learning." Neural Information Processing Systems, 2018.Markdown
[Negrinho et al. "Learning Beam Search Policies via Imitation Learning." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/negrinho2018neurips-learning/)BibTeX
@inproceedings{negrinho2018neurips-learning,
title = {{Learning Beam Search Policies via Imitation Learning}},
author = {Negrinho, Renato and Gormley, Matthew and Gordon, Geoffrey J.},
booktitle = {Neural Information Processing Systems},
year = {2018},
pages = {10652-10661},
url = {https://mlanthology.org/neurips/2018/negrinho2018neurips-learning/}
}