Jet: A Modern Transformer-Based Normalizing Flow
Abstract
In the past, normalizing generative flows have emerged as a promising class of generative models for natural images. This type of model has many modeling advantages: the ability to efficiently compute log-likelihood of the input data, fast generation, and simple overall structure. Normalizing flows remained a topic of active research but later fell out of favor, as visual quality of the samples was not competitive with other model classes, such as GANs, VQ-VAE-based approaches or diffusion models. In this paper we revisit the design of coupling-based normalizing flow models by carefully ablating prior design choices and using computational blocks based on the Vision Transformer architecture, not convolutional neural networks. As a result, we achieve a much simpler architecture that matches existing normalizing flow models and improves over them when paired with pretraining. While the overall visual quality is still behind the current state-of-the-art models, we argue that strong normalizing flow models can help advancing the research frontier by serving as building components of more powerful generative models.
Cite
Text
Kolesnikov et al. "Jet: A Modern Transformer-Based Normalizing Flow." Transactions on Machine Learning Research, 2025.Markdown
[Kolesnikov et al. "Jet: A Modern Transformer-Based Normalizing Flow." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/kolesnikov2025tmlr-jet/)BibTeX
@article{kolesnikov2025tmlr-jet,
title = {{Jet: A Modern Transformer-Based Normalizing Flow}},
author = {Kolesnikov, Alexander and Pinto, André Susano and Tschannen, Michael},
journal = {Transactions on Machine Learning Research},
year = {2025},
url = {https://mlanthology.org/tmlr/2025/kolesnikov2025tmlr-jet/}
}