Principled Gradient-Based MCMC for Conditional Sampling of Text
Abstract
We consider the problem of sampling text from an energy-based model. This arises, for example, when sampling text from a neural language model subject to soft constraints. Although the target distribution is discrete, the internal computations of the energy function (given by the language model) are differentiable, so one would like to exploit gradient information within a method such as MCMC. Alas, all previous attempts to generalize gradient-based MCMC to text sampling fail to sample correctly from the target distribution. We propose a solution, along with variants, and study its theoretical properties. Through experiments on various forms of text generation, we demonstrate that our unbiased samplers are able to generate more fluent text while better adhering to the control objectives. The same methods could be used to sample from discrete energy-based models unrelated to text.
Cite
Text
Du et al. "Principled Gradient-Based MCMC for Conditional Sampling of Text." International Conference on Machine Learning, 2024.Markdown
[Du et al. "Principled Gradient-Based MCMC for Conditional Sampling of Text." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/du2024icml-principled/)BibTeX
@inproceedings{du2024icml-principled,
title = {{Principled Gradient-Based MCMC for Conditional Sampling of Text}},
author = {Du, Li and Amini, Afra and Torroba Hennigen, Lucas and Yu, Xinyan Velocity and Lee, Holden and Eisner, Jason and Cotterell, Ryan},
booktitle = {International Conference on Machine Learning},
year = {2024},
pages = {11663-11685},
volume = {235},
url = {https://mlanthology.org/icml/2024/du2024icml-principled/}
}