The Image Local Autoregressive Transformer

Abstract

Recently, AutoRegressive (AR) models for the whole image generation empowered by transformers have achieved comparable or even better performance compared to Generative Adversarial Networks (GANs). Unfortunately, directly applying such AR models to edit/change local image regions, may suffer from the problems of missing global information, slow inference speed, and information leakage of local guidance. To address these limitations, we propose a novel model -- image Local Autoregressive Transformer (iLAT), to better facilitate the locally guided image synthesis. Our iLAT learns the novel local discrete representations, by the newly proposed local autoregressive (LA) transformer of the attention mask and convolution mechanism. Thus iLAT can efficiently synthesize the local image regions by key guidance information. Our iLAT is evaluated on various locally guided image syntheses, such as pose-guided person image synthesis and face editing. Both quantitative and qualitative results show the efficacy of our model.

Cite

Text

Cao et al. "The Image Local Autoregressive Transformer." Neural Information Processing Systems, 2021.

Markdown

[Cao et al. "The Image Local Autoregressive Transformer." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/cao2021neurips-image/)

BibTeX

@inproceedings{cao2021neurips-image,
  title     = {{The Image Local Autoregressive Transformer}},
  author    = {Cao, Chenjie and Hong, Yuxin and Li, Xiang and Wang, Chengrong and Xu, Chengming and Fu, Yanwei and Xue, Xiangyang},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/cao2021neurips-image/}
}