NoisyTwins: Class-Consistent and Diverse Image Generation Through StyleGANs

Abstract

StyleGANs are at the forefront of controllable image generation as they produce a latent space that is semantically disentangled, making it suitable for image editing and manipulation. However, the performance of StyleGANs severely degrades when trained via class-conditioning on large-scale long-tailed datasets. We find that one reason for degradation is the collapse of latents for each class in the W latent space. With NoisyTwins, we first introduce an effective and inexpensive augmentation strategy for class embeddings, which then decorrelates the latents based on self-supervision in the W space. This decorrelation mitigates collapse, ensuring that our method preserves intra-class diversity with class-consistency in image generation. We show the effectiveness of our approach on large-scale real-world long-tailed datasets of ImageNet-LT and iNaturalist 2019, where our method outperforms other methods by 19% on FID, establishing a new state-of-the-art.

Cite

Text

Rangwani et al. "NoisyTwins: Class-Consistent and Diverse Image Generation Through StyleGANs." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00580

Markdown

[Rangwani et al. "NoisyTwins: Class-Consistent and Diverse Image Generation Through StyleGANs." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/rangwani2023cvpr-noisytwins/) doi:10.1109/CVPR52729.2023.00580

BibTeX

@inproceedings{rangwani2023cvpr-noisytwins,
  title     = {{NoisyTwins: Class-Consistent and Diverse Image Generation Through StyleGANs}},
  author    = {Rangwani, Harsh and Bansal, Lavish and Sharma, Kartik and Karmali, Tejan and Jampani, Varun and Babu, R. Venkatesh},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {5987-5996},
  doi       = {10.1109/CVPR52729.2023.00580},
  url       = {https://mlanthology.org/cvpr/2023/rangwani2023cvpr-noisytwins/}
}