CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility

Abstract

Video inpainting is a crucial task with diverse applications, including fine-grained video editing, video recovery, and video dewatermarking. However, most existing video inpainting methods primarily focus on visual content completion while neglecting text information. There are only a limited number of text-guided video inpainting techniques, and these techniques struggle with maintaining visual quality and exhibit poor semantic representation capabilities. In this paper, we introduce CoCoCo, a text-guided video inpainting diffusion framework. To address the aforementioned challenges, we enhance both the training data and model structure. Specifically, we devise an instance-aware region selection strategy for masked area sampling and develop a novel motion block that incorporates efficient 3D full attention and textual cross attention. Additionally, our CoCoCo framework can be seamlessly integrated with various personalized text-to-image diffusion models through a delicate training-free transfer mechanism. Comprehensive experiments demonstrate that CoCoCo can create high-quality visual content with enhanced temporal consistency, improved text controllability, and better compatibility with personalized image models.

Cite

Text

Zi et al. "CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I10.33203

Markdown

[Zi et al. "CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/zi2025aaai-cococo/) doi:10.1609/AAAI.V39I10.33203

BibTeX

@inproceedings{zi2025aaai-cococo,
  title     = {{CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility}},
  author    = {Zi, Bojia and Zhao, Shihao and Qi, Xianbiao and Wang, Jianan and Shi, Yukai and Chen, Qianyu and Liang, Bin and Xiao, Rong and Wong, Kam-Fai and Zhang, Lei},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {11067-11076},
  doi       = {10.1609/AAAI.V39I10.33203},
  url       = {https://mlanthology.org/aaai/2025/zi2025aaai-cococo/}
}