Doubly Abductive Counterfactual Inference for Text-Based Image Editing

Abstract

We study text-based image editing (TBIE) of a single image by counterfactual inference because it is an elegant formulation to precisely address the requirement: the edited image should retain the fidelity of the original one. Through the lens of the formulation we find that the crux of TBIE is that existing techniques hardly achieve a good trade-off between editability and fidelity mainly due to the overfitting of the single-image fine-tuning. To this end we propose a Doubly Abductive Counterfactual inference framework (DAC). We first parameterize an exogenous variable as a UNet LoRA whose abduction can encode all the image details. Second we abduct another exogenous variable parameterized by a text encoder LoRA which recovers the lost editability caused by the overfitted first abduction. Thanks to the second abduction which exclusively encodes the visual transition from post-edit to pre-edit its inversion---subtracting the LoRA---effectively reverts pre-edit back to post-edit thereby accomplishing the edit. Through extensive experiments our DAC achieves a good trade-off between editability and fidelity. Thus we can support a wide spectrum of user editing intents including addition removal manipulation replacement style transfer and facial change which are extensively validated in both qualitative and quantitative evaluations. Codes are in https://github.com/xuesong39/DAC.

Cite

Text

Song et al. "Doubly Abductive Counterfactual Inference for Text-Based Image Editing." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00875

Markdown

[Song et al. "Doubly Abductive Counterfactual Inference for Text-Based Image Editing." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/song2024cvpr-doubly/) doi:10.1109/CVPR52733.2024.00875

BibTeX

@inproceedings{song2024cvpr-doubly,
  title     = {{Doubly Abductive Counterfactual Inference for Text-Based Image Editing}},
  author    = {Song, Xue and Cui, Jiequan and Zhang, Hanwang and Chen, Jingjing and Hong, Richang and Jiang, Yu-Gang},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {9162-9171},
  doi       = {10.1109/CVPR52733.2024.00875},
  url       = {https://mlanthology.org/cvpr/2024/song2024cvpr-doubly/}
}