Grounded, Controllable and Debiased Image Completion with Lexical Semantics

Zhang, Shengyu; Jiang, Tan; Huang, Qinghao; Tan, Ziqi; Kuang, Kun; Zhao, Zhou; Tang, Siliang; Yu, Jin; Yang, Hongxia; Yang, Yi; Wu, Fei

doi:10.1109/CVPRW53098.2021.00192

Grounded, Controllable and Debiased Image Completion with Lexical Semantics

Shengyu Zhang, Tan Jiang, Qinghao Huang, Ziqi Tan, Kun Kuang, Zhou Zhao, Siliang Tang, Jin Yu, Hongxia Yang, Yi Yang, Fei Wu

CVPRW 2021 pp. 1748-1751

doi:10.1109/CVPRW53098.2021.00192 /cvprw/2021/zhang2021cvprw-grounded/

Abstract

In this paper, we present an approach, namely Lexical Semantic Image Completion (LSIC) 1, that may have potential applications in art, design, and heritage conservation, among several others. Existing image completion procedure is highly subjective by considering only visual context, which may trigger unpredictable results which are plausible but not faithful to a grounded knowledge. To permit both grounded and controllable completion process, we advocate generating results faithful to both visual and lexical semantic context, i.e., the description of leaving holes or blank regions in the image (e.g., hole description). One major challenge for LSIC comes from modeling and aligning the structure of visual-semantic context and translating across different modalities. We devise multi-grained reasoning blocks to address this challenge. Another challenge relates to the unimodal biases, which occurs when the model generates plausible results without using the textual description. We devise an unsupervised unpaired-creation learning path that explicitly performs counterfactual thinking, i.e., what the complete image would be if given an unpaired text description to the incomplete image. A cycle consistency loss is devised to guarantee counterfactual faithfulness. We conduct extensive quantitative and qualitative experiments that reveal the strengths of LSIC in being grounded, controllable, and debiased.

PDF CVPRW Semantic Scholar

Cite

Text

Zhang et al. "Grounded, Controllable and Debiased Image Completion with Lexical Semantics." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2021. doi:10.1109/CVPRW53098.2021.00192

Markdown

[Zhang et al. "Grounded, Controllable and Debiased Image Completion with Lexical Semantics." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2021.](https://mlanthology.org/cvprw/2021/zhang2021cvprw-grounded/) doi:10.1109/CVPRW53098.2021.00192

BibTeX

@inproceedings{zhang2021cvprw-grounded,
  title     = {{Grounded, Controllable and Debiased Image Completion with Lexical Semantics}},
  author    = {Zhang, Shengyu and Jiang, Tan and Huang, Qinghao and Tan, Ziqi and Kuang, Kun and Zhao, Zhou and Tang, Siliang and Yu, Jin and Yang, Hongxia and Yang, Yi and Wu, Fei},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2021},
  pages     = {1748-1751},
  doi       = {10.1109/CVPRW53098.2021.00192},
  url       = {https://mlanthology.org/cvprw/2021/zhang2021cvprw-grounded/}
}