Shifted Diffusion for Text-to-Image Generation

Zhou, Yufan; Liu, Bingchen; Zhu, Yizhe; Yang, Xiao; Chen, Changyou; Xu, Jinhui

doi:10.1109/CVPR52729.2023.00979

Shifted Diffusion for Text-to-Image Generation

Yufan Zhou, Bingchen Liu, Yizhe Zhu, Xiao Yang, Changyou Chen, Jinhui Xu

CVPR 2023 pp. 10157-10166

doi:10.1109/CVPR52729.2023.00979 /cvpr/2023/zhou2023cvpr-shifted/

Abstract

We present Corgi, a novel method for text-to-image generation. Corgi is based on our proposed shifted diffusion model, which achieves better image embedding generation from input text. Different from the baseline diffusion model used in DALL-E 2, our method seamlessly encodes prior knowledge of the pre-trained CLIP model in its diffusion process by designing a new initialization distribution and a new transition step of the diffusion. Compared to the strong DALL-E 2 baseline, our method performs better in generating image embedding from the text in terms of both efficiency and effectiveness, which consequently results in better text-to-image generation. Extensive large-scale experiments are conducted and evaluated in terms of both quantitative measures and human evaluation, indicating a stronger generation ability of our method compared to existing ones. Furthermore, our model enables semi-supervised and language-free training for text-to-image generation, where only part or none of the images in the training dataset have an associated caption. Trained with only 1.7% of the images being captioned, our semi-supervised model obtains FID results comparable to DALL-E 2 on zero-shot text-to-image generation evaluated on MS-COCO. Corgi also achieves new state-of-the-art results across different datasets on downstream language-free text-to-image generation tasks, outperforming the previous method, Lafite, by a large margin.

PDF CVPR Semantic Scholar

Cite

Text

Zhou et al. "Shifted Diffusion for Text-to-Image Generation." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00979

Markdown

[Zhou et al. "Shifted Diffusion for Text-to-Image Generation." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/zhou2023cvpr-shifted/) doi:10.1109/CVPR52729.2023.00979

BibTeX

@inproceedings{zhou2023cvpr-shifted,
  title     = {{Shifted Diffusion for Text-to-Image Generation}},
  author    = {Zhou, Yufan and Liu, Bingchen and Zhu, Yizhe and Yang, Xiao and Chen, Changyou and Xu, Jinhui},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {10157-10166},
  doi       = {10.1109/CVPR52729.2023.00979},
  url       = {https://mlanthology.org/cvpr/2023/zhou2023cvpr-shifted/}
}