Autoregressive Image Generation Using Residual Quantization

Abstract

For autoregressive (AR) modeling of high-resolution images, vector quantization (VQ) represents an image as a sequence of discrete codes. A short sequence length is important for an AR model to reduce its computational costs to consider long-range interactions of codes. However, we postulate that previous VQ cannot shorten the code sequence and generate high-fidelity images together in terms of the rate-distortion trade-off. In this study, we propose the two-stage framework, which consists of Residual-Quantized VAE (RQ-VAE) and RQ-Transformer, to effectively generate high-resolution images. Given a fixed codebook size, RQ-VAE can precisely approximate a feature map of an image and represent the image as a stacked map of discrete codes. Then, RQ-Transformer learns to predict the quantized feature vector at the next position by predicting the next stack of codes. Thanks to the precise approximation of RQ-VAE, we can represent a 256x256 image as 8x8 resolution of the feature map, and RQ-Transformer can efficiently reduce the computational costs. Consequently, our framework outperforms the existing AR models on various benchmarks of unconditional and conditional image generation. Our approach also has a significantly faster sampling speed than previous AR models to generate high-quality images.

Cite

Text

Lee et al. "Autoregressive Image Generation Using Residual Quantization." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.01123

Markdown

[Lee et al. "Autoregressive Image Generation Using Residual Quantization." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/lee2022cvpr-autoregressive/) doi:10.1109/CVPR52688.2022.01123

BibTeX

@inproceedings{lee2022cvpr-autoregressive,
  title     = {{Autoregressive Image Generation Using Residual Quantization}},
  author    = {Lee, Doyup and Kim, Chiheon and Kim, Saehoon and Cho, Minsu and Han, Wook-Shin},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2022},
  pages     = {11523-11532},
  doi       = {10.1109/CVPR52688.2022.01123},
  url       = {https://mlanthology.org/cvpr/2022/lee2022cvpr-autoregressive/}
}