VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing

Abstract

Recently, diffusion-based generative models have achieved remarkable success for image generation and edition. However, existing diffusion-based video editing approaches lack the ability to offer precise control over generated content that maintains temporal consistency in long-term videos. On the other hand, atlas-based methods provide strong temporal consistency but are costly to edit a video and lack spatial control. In this work, we introduce VidEdit, a novel method for zero-shot text-based video editing that guarantees robust temporal and spatial consistency. In particular, we combine an atlas-based video representation with a pre-trained text-to-image diffusion model to provide a training-free and efficient video editing method, which by design fulfills temporal smoothness. To grant precise user control over generated content, we utilize conditional information extracted from off-the-shelf panoptic segmenters and edge detectors which guides the diffusion sampling process. This method ensures a fine spatial control on targeted regions while strictly preserving the structure of the original video. Our quantitative and qualitative experiments show that VidEdit outperforms state-of-the-art methods on DAVIS dataset, regarding semantic faithfulness, image preservation, and temporal consistency metrics. With this framework, processing a single video only takes approximately one minute, and it can generate multiple compatible edits based on a unique text prompt.

Cite

Text

Couairon et al. "VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing." Transactions on Machine Learning Research, 2024.

Markdown

[Couairon et al. "VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/couairon2024tmlr-videdit/)

BibTeX

@article{couairon2024tmlr-videdit,
  title     = {{VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing}},
  author    = {Couairon, Paul and Rambour, Clément and Haugeard, Jean-Emmanuel and Thome, Nicolas},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
  url       = {https://mlanthology.org/tmlr/2024/couairon2024tmlr-videdit/}
}