Shaping a Stabilized Video by Mitigating Unintended Changes for Concept-Augmented Video Editing

Abstract

Text-driven video editing powered by generative diffusion models holds significant promise for applications spanning film production, advertising, and beyond. However, the limited expressiveness of pre-trained word embeddings often restricts nuanced edits, especially when targeting novel concepts with specific attributes. In this work, we present a novel Concept-Augmented Textual Inversion (CATI) framework that flexibly integrates new object information from user-provided concept videos. By fine-tuning only the V (Value) projection in attention via Low-Rank Adaptation (LoRA), our approach preserves the original attention distribution of the diffusion model while efficiently incorporating external concept knowledge. To further stabilize editing results and mitigate the issue of attention dispersion when prompt keywords are modified, we introduce a Dual Prior Supervision (DPS) mechanism. DPS supervises cross-attention between the source and target prompts, preventing undesired changes to non-target areas and improving the fidelity of novel concepts. Extensive evaluations demonstrate that our plug-and-play solution not only maintains spatial and temporal consistency but also outperforms state-of-the-art methods in generating lifelike and stable edited videos. The source code is publicly available at https://guomc9.github.io/STIVE-PAGE/.

Cite

Text

Guo et al. "Shaping a Stabilized Video by Mitigating Unintended Changes for Concept-Augmented Video Editing." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/119

Markdown

[Guo et al. "Shaping a Stabilized Video by Mitigating Unintended Changes for Concept-Augmented Video Editing." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/guo2025ijcai-shaping/) doi:10.24963/IJCAI.2025/119

BibTeX

@inproceedings{guo2025ijcai-shaping,
  title     = {{Shaping a Stabilized Video by Mitigating Unintended Changes for Concept-Augmented Video Editing}},
  author    = {Guo, Mingce and He, Jingxuan and Yin, Yufei and Wang, Zhangye and Tang, Shengeng and Cheng, Lechao},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {1062-1070},
  doi       = {10.24963/IJCAI.2025/119},
  url       = {https://mlanthology.org/ijcai/2025/guo2025ijcai-shaping/}
}