EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Abstract

Latent generative models have emerged as a leading approach for high-quality image synthesis. These models rely on an autoencoder to compress images into a latent space, followed by a generative model to learn the latent distribution. We identify that existing autoencoders lack equivariance to semantic-preserving transformations like scaling and rotation, resulting in complex latent spaces that hinder generative performance. To address this, we propose EQ-VAE, a simple regularization approach that enforces equivariance in the latent space, reducing its complexity without degrading reconstruction quality. By finetuning pre-trained autoencoders with EQ-VAE, we enhance the performance of several state-of-the-art generative models, including DiT, SiT, REPA and MaskGIT, achieving a $\times$7 speedup on DiT-XL/2 with only five epochs of SD-VAE fine-tuning. EQ-VAE is compatible with both continuous and discrete autoencoders, thus offering a versatile enhancement for a wide range of latent generative models.

Cite

Text

Kouzelis et al. "EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Kouzelis et al. "EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/kouzelis2025icml-eqvae/)

BibTeX

@inproceedings{kouzelis2025icml-eqvae,
  title     = {{EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling}},
  author    = {Kouzelis, Theodoros and Kakogeorgiou, Ioannis and Gidaris, Spyros and Komodakis, Nikos},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {31648-31666},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/kouzelis2025icml-eqvae/}
}