NoisyTwins: Class-Consistent and Diverse Image Generation Through StyleGANs
Abstract
StyleGANs are at the forefront of controllable image generation as they produce a latent space that is semantically disentangled, making it suitable for image editing and manipulation. However, the performance of StyleGANs severely degrades when trained via class-conditioning on large-scale long-tailed datasets. We find that one reason for degradation is the collapse of latents for each class in the W latent space. With NoisyTwins, we first introduce an effective and inexpensive augmentation strategy for class embeddings, which then decorrelates the latents based on self-supervision in the W space. This decorrelation mitigates collapse, ensuring that our method preserves intra-class diversity with class-consistency in image generation. We show the effectiveness of our approach on large-scale real-world long-tailed datasets of ImageNet-LT and iNaturalist 2019, where our method outperforms other methods by 19% on FID, establishing a new state-of-the-art.
Cite
Text
Rangwani et al. "NoisyTwins: Class-Consistent and Diverse Image Generation Through StyleGANs." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00580Markdown
[Rangwani et al. "NoisyTwins: Class-Consistent and Diverse Image Generation Through StyleGANs." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/rangwani2023cvpr-noisytwins/) doi:10.1109/CVPR52729.2023.00580BibTeX
@inproceedings{rangwani2023cvpr-noisytwins,
title = {{NoisyTwins: Class-Consistent and Diverse Image Generation Through StyleGANs}},
author = {Rangwani, Harsh and Bansal, Lavish and Sharma, Kartik and Karmali, Tejan and Jampani, Varun and Babu, R. Venkatesh},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2023},
pages = {5987-5996},
doi = {10.1109/CVPR52729.2023.00580},
url = {https://mlanthology.org/cvpr/2023/rangwani2023cvpr-noisytwins/}
}