Dimensional Collapse in VQVAEs: Evidence and Remedies

Abstract

Vector-Quantized Variational Autoencoders (VQVAEs) have enabled strong performance in generative modeling by mapping continuous data to learnable codes. In this work, we identify a surprising yet consistent phenomenon that we term \emph{dimensional collapse}: despite using high-dimensional embeddings, VQVAEs tend to compress their representations into a much smaller subspace, typically only 4 to 10 dimensions. We provide an in-depth analysis of this phenomenon and reveal its relation to model performance and learning dynamics. Interestingly, VQVAEs naturally gravitate toward this low-dimensional regime, and enforcing higher-dimensional usage (e.g., via rank regularization) could lead to degraded performance. To overcome this low-dimensionality limitation, we propose \textbf{Divide-and-Conquer VQ (DCVQ)}, which partitions the latent space into multiple low-dimensional subspaces, each quantized independently. By design, each subspace respects the model’s preference for low dimensionality, while their combination expands the overall capacity. Our results show that DCVQ overcomes the inherent dimensional bottleneck and achieves improved reconstruction quality across image datasets.

Cite

Text

Zhang et al. "Dimensional Collapse in VQVAEs: Evidence and Remedies." Advances in Neural Information Processing Systems, 2025.

Markdown

[Zhang et al. "Dimensional Collapse in VQVAEs: Evidence and Remedies." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/zhang2025neurips-dimensional/)

BibTeX

@inproceedings{zhang2025neurips-dimensional,
  title     = {{Dimensional Collapse in VQVAEs: Evidence and Remedies}},
  author    = {Zhang, Jiayou and Shen, Yifan and Chen, Guangyi and Song, Le and Xing, Eric P.},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/zhang2025neurips-dimensional/}
}