HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes

Abstract

Vector quantization (VQ) is a technique to deterministically learn features with discrete codebook representations. It is commonly performed with a variational autoencoding model, VQ-VAE, which can be further extended to hierarchical structures for making high-fidelity reconstructions. However, such hierarchical extensions of VQ-VAE often suffer from the codebook/layer collapse issue, where the codebook is not efficiently used to express the data, and hence degrades reconstruction accuracy. To mitigate this problem, we propose a novel unified framework to stochastically learn hierarchical discrete representation on the basis of the variational Bayes framework, called hierarchically quantized variational autoencoder (HQ-VAE). HQ-VAE naturally generalizes the hierarchical variants of VQ-VAE, such as VQ-VAE-2 and residual-quantized VAE (RQ-VAE), and provides them with a Bayesian training scheme. Our comprehensive experiments on image datasets show that HQ-VAE enhances codebook usage and improves reconstruction performance. We also validated HQ-VAE in terms of its applicability to a different modality with an audio dataset.

Cite

Text

Takida et al. "HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes." Transactions on Machine Learning Research, 2024.

Markdown

[Takida et al. "HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/takida2024tmlr-hqvae/)

BibTeX

@article{takida2024tmlr-hqvae,
  title     = {{HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes}},
  author    = {Takida, Yuhta and Ikemiya, Yukara and Shibuya, Takashi and Shimada, Kazuki and Choi, Woosung and Lai, Chieh-Hsin and Murata, Naoki and Uesaka, Toshimitsu and Uchida, Kengo and Liao, Wei-Hsiang and Mitsufuji, Yuki},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
  url       = {https://mlanthology.org/tmlr/2024/takida2024tmlr-hqvae/}
}