Language Modeling via Stochastic Processes

Abstract

Modern language models can generate high-quality short texts. However, they often meander or are incoherent when generating longer texts. These issues arise from the next-token-only language modeling objective. To address these issues, we introduce Time Control (TC), a language model that implicitly plans via a latent stochastic process. TC does this by learning a representation which maps the dynamics of how text changes in a document to the dynamics of a stochastic process of interest. Using this representation, the language model can generate text by first implicitly generating a document plan via a stochastic process, and then generating text that is consistent with this latent plan. Compared to domain-specific methods and fine-tuning GPT2 across a variety of text domains, TC improves performance on text infilling and discourse coherence. On long text generation settings, TC preserves the text structure both in terms of ordering (up to +40% better) and text length consistency (up to +17% better). Human evaluators also prefer TC's output 28.6% more than the baselines.

Cite

Text

Wang et al. "Language Modeling via Stochastic Processes." International Conference on Learning Representations, 2022.

Markdown

[Wang et al. "Language Modeling via Stochastic Processes." International Conference on Learning Representations, 2022.](https://mlanthology.org/iclr/2022/wang2022iclr-language/)

BibTeX

@inproceedings{wang2022iclr-language,
  title     = {{Language Modeling via Stochastic Processes}},
  author    = {Wang, Rose E and Durmus, Esin and Goodman, Noah and Hashimoto, Tatsunori},
  booktitle = {International Conference on Learning Representations},
  year      = {2022},
  url       = {https://mlanthology.org/iclr/2022/wang2022iclr-language/}
}