DragVideo: Interactive Drag-Style Video Editing
Abstract
Video generation models have shown their superior ability to generate photo-realistic video. However, how to accurately control (or edit) the video remains a formidable challenge. The main issues are: 1) how to perform direct and accurate user control in editing; 2) how to execute editings like changing shape, expression, and layout without unsightly distortion and artifacts to the edited content; and 3) how to maintain spatio-temporal consistency of video after editing. To address the above issues, we propose DragVideo, a general drag-style video editing framework. Inspired by DragGAN [?], DragVideo addresses issues 1) and 2) by proposing the drag-style video latent optimization method which gives desired control by updating noisy video latent according to drag instructions through video-level drag objective function. We amend issue 3) by integrating the video diffusion model with sample-specific LoRA and Mutual Self-Attention in DragVideo to ensure the edited result is spatio-temporally consistent. We also present a series of testing examples for drag-style video editing and conduct extensive experiments across a wide array of challenging editing cases, showing DragVideo can edit video in an intuitive, faithful-to-user-intention manner, with nearly unnoticeable distortion and artifacts, while maintaining spatio-temporal consistency. While traditional prompt-based video editing fails to do the former two and directly applying image drag editing fails in the last, DragVideo’s versatility and generality are emphasized. Project page: https://dragvideo. github.io/
Cite
Text
Deng et al. "DragVideo: Interactive Drag-Style Video Editing." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72992-8_11Markdown
[Deng et al. "DragVideo: Interactive Drag-Style Video Editing." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/deng2024eccv-dragvideo/) doi:10.1007/978-3-031-72992-8_11BibTeX
@inproceedings{deng2024eccv-dragvideo,
title = {{DragVideo: Interactive Drag-Style Video Editing}},
author = {Deng, Yufan and Wang, Ruida and Zhang, Yuhao and Tai, Yu-Wing and Tang, Chi-Keung},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
doi = {10.1007/978-3-031-72992-8_11},
url = {https://mlanthology.org/eccv/2024/deng2024eccv-dragvideo/}
}