LeftRefill: Filling Right Canvas Based on Left Reference Through Generalized Text-to-Image Diffusion Model

Abstract

This paper introduces LeftRefill an innovative approach to efficiently harness large Text-to-Image (T2I) diffusion models for reference-guided image synthesis. As the name implies LeftRefill horizontally stitches reference and target views together as a whole input. The reference image occupies the left side while the target canvas is positioned on the right. Then LeftRefill paints the right-side target canvas based on the left-side reference and specific task instructions. Such a task formulation shares some similarities with contextual inpainting akin to the actions of a human painter. This novel formulation efficiently learns both structural and textured correspondence between reference and target without other image encoders or adapters. We inject task and view information through cross-attention modules in T2I models and further exhibit multi-view reference ability via the re-arranged self-attention modules. These enable LeftRefill to perform consistent generation as a generalized model without requiring test-time fine-tuning or model modifications. Thus LeftRefill can be seen as a simple yet unified framework to address reference-guided synthesis. As an exemplar we leverage LeftRefill to address two different challenges: reference-guided inpainting and novel view synthesis based on the pre-trained StableDiffusion. Codes and models are released at https://github.com/ewrfcas/LeftRefill.

Cite

Text

Cao et al. "LeftRefill: Filling Right Canvas Based on Left Reference Through Generalized Text-to-Image Diffusion Model." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00736

Markdown

[Cao et al. "LeftRefill: Filling Right Canvas Based on Left Reference Through Generalized Text-to-Image Diffusion Model." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/cao2024cvpr-leftrefill/) doi:10.1109/CVPR52733.2024.00736

BibTeX

@inproceedings{cao2024cvpr-leftrefill,
  title     = {{LeftRefill: Filling Right Canvas Based on Left Reference Through Generalized Text-to-Image Diffusion Model}},
  author    = {Cao, Chenjie and Cai, Yunuo and Dong, Qiaole and Wang, Yikai and Fu, Yanwei},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {7705-7715},
  doi       = {10.1109/CVPR52733.2024.00736},
  url       = {https://mlanthology.org/cvpr/2024/cao2024cvpr-leftrefill/}
}