Flexible and Efficient Grammar-Constrained Decoding

Abstract

Large Language Models (LLMs) are often asked to generate structured outputs that obey precise syntactic rules, such as code snippets or formatted data. Grammar-constrained decoding (GCD) can guarantee that LLM outputs matches such rules by masking out tokens that will provably lead to outputs that do not belong to a specified context-free grammar (CFG). To guarantee soundness, GCD algorithms have to compute how a given LLM subword tokenizer can “align” with the tokens used by a given context-free grammar and compute token masks based on this information. Doing so efficiently is challenging and existing GCD algorithms require tens of minutes to preprocess common grammars. We present a new GCD algorithm together with an implementation that offers 17.71x faster offline preprocessing than existing approaches while preserving state-of-the-art efficiency in online mask computation.

Cite

Text

Park et al. "Flexible and Efficient Grammar-Constrained Decoding." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Park et al. "Flexible and Efficient Grammar-Constrained Decoding." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/park2025icml-flexible/)

BibTeX

@inproceedings{park2025icml-flexible,
  title     = {{Flexible and Efficient Grammar-Constrained Decoding}},
  author    = {Park, Kanghee and Zhou, Timothy and D’Antoni, Loris},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {48262-48275},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/park2025icml-flexible/}
}