AdaptEdit: An Adaptive Correspondence Guidance Framework for Reference-Based Video Editing
Abstract
Video editing is a pivotal process for customizing video content according to user needs. However, existing text-guided methods often lead to ambiguities regarding user intentions and restrict fine-grained control for editing specific aspects in videos. To overcome these limitations, this paper introduces a novel approach named \emph{AdaptEdit}, which focuses on reference-based video editing that disentangles the editing process. It achieves this by first editing a reference image and then adaptively propagating its appearance across other frames to complete the video editing. While previous propagation methods, such as optical flow and the temporal modules of recent video generative models, struggle with object deformations and large motions, we propose an adaptive correspondence strategy that accurately transfers the appearance from the reference frame to the target frames by leveraging inter-frame semantic correspondences in the original video. By implementing a proxy-editing task to optimize hyperparameters for image token-level correspondence, our method effectively balances the need to maintain the target frame's structure while preventing leakage of irrelevant appearance. To more accurately evaluate editing beyond the semantic-level consistency provided by CLIP-style models, we introduce a new dataset, PVA, which supports pixel-level evaluation. Our method outperforms the best-performing baseline with a clear PSNR improvement of 3.6 dB.
Cite
Text
Su et al. "AdaptEdit: An Adaptive Correspondence Guidance Framework for Reference-Based Video Editing." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/1131Markdown
[Su et al. "AdaptEdit: An Adaptive Correspondence Guidance Framework for Reference-Based Video Editing." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/su2025ijcai-adaptedit/) doi:10.24963/IJCAI.2025/1131BibTeX
@inproceedings{su2025ijcai-adaptedit,
title = {{AdaptEdit: An Adaptive Correspondence Guidance Framework for Reference-Based Video Editing}},
author = {Su, Tongtong and Wang, Chengyu and Liu, Bingyan and Huang, Jun and Lu, Dongming},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2025},
pages = {10180-10188},
doi = {10.24963/IJCAI.2025/1131},
url = {https://mlanthology.org/ijcai/2025/su2025ijcai-adaptedit/}
}