Multi-Scale Contrastive Learning for Complex Scene Generation

Abstract

Recent advances in Generative Adversarial Networks (GANs) have enabled photo-realistic synthesis of single object images. Yet, modeling more complex distributions, such as scenes with multiple objects, remains challenging. The difficulty stems from the incalculable variety of scene configurations which contain multiple objects of different categories placed at various locations. In this paper, we aim to alleviate the difficulty by enhancing the discriminative ability of the discriminator through a locally defined self-supervised pretext task. To this end, we design a discriminator to leverage multi-scale local feedback that guides the generator to better model local semantic structures in the scene. Then, we require the discriminator to carry out pixel-level contrastive learning at multiple scales to enhance discriminative capability on local regions. Experimental results on several challenging scene datasets show that our method improves the synthesis quality by a substantial margin compared to state-of-the-art baselines.

Cite

Text

Lee et al. "Multi-Scale Contrastive Learning for Complex Scene Generation." Winter Conference on Applications of Computer Vision, 2023.

Markdown

[Lee et al. "Multi-Scale Contrastive Learning for Complex Scene Generation." Winter Conference on Applications of Computer Vision, 2023.](https://mlanthology.org/wacv/2023/lee2023wacv-multiscale/)

BibTeX

@inproceedings{lee2023wacv-multiscale,
  title     = {{Multi-Scale Contrastive Learning for Complex Scene Generation}},
  author    = {Lee, Hanbit and Kim, Youna and Lee, Sang-goo},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2023},
  pages     = {764-774},
  url       = {https://mlanthology.org/wacv/2023/lee2023wacv-multiscale/}
}