Stochastic Beams and Where to Find Them: The Gumbel-Top-K Trick for Sampling Sequences Without Replacement
Abstract
The well-known Gumbel-Max trick for sampling from a categorical distribution can be extended to sample $k$ elements without replacement. We show how to implicitly apply this ’Gumbel-Top-$k$’ trick on a factorized distribution over sequences, allowing to draw exact samples without replacement using a Stochastic Beam Search. Even for exponentially large domains, the number of model evaluations grows only linear in $k$ and the maximum sampled sequence length. The algorithm creates a theoretical connection between sampling and (deterministic) beam search and can be used as a principled intermediate alternative. In a translation task, the proposed method compares favourably against alternatives to obtain diverse yet good quality translations. We show that sequences sampled without replacement can be used to construct low-variance estimators for expected sentence-level BLEU score and model entropy.
Cite
Text
Kool et al. "Stochastic Beams and Where to Find Them: The Gumbel-Top-K Trick for Sampling Sequences Without Replacement." International Conference on Machine Learning, 2019.Markdown
[Kool et al. "Stochastic Beams and Where to Find Them: The Gumbel-Top-K Trick for Sampling Sequences Without Replacement." International Conference on Machine Learning, 2019.](https://mlanthology.org/icml/2019/kool2019icml-stochastic/)BibTeX
@inproceedings{kool2019icml-stochastic,
title = {{Stochastic Beams and Where to Find Them: The Gumbel-Top-K Trick for Sampling Sequences Without Replacement}},
author = {Kool, Wouter and Van Hoof, Herke and Welling, Max},
booktitle = {International Conference on Machine Learning},
year = {2019},
pages = {3499-3508},
volume = {97},
url = {https://mlanthology.org/icml/2019/kool2019icml-stochastic/}
}