Deformable Sprites for Unsupervised Video Decomposition
Abstract
We describe a method to extract persistent elements of a dynamic scene from an input video. We represent each scene element as a Deformable Sprite consisting of three components: 1) a 2D texture image for the entire video, 2) per-frame masks for the element, and 3) non-rigid deformations that map the texture image into each video frame. The resulting decomposition allows for applications such as consistent video editing. Deformable Sprites are a type of video auto-encoder model that is optimized on individual videos, and does not require training on a large dataset, nor does it rely on pre-trained models. Moreover, our method does not require object masks or other user input, and discovers moving objects of a wider variety than previous work. We evaluate our approach on standard video datasets and show qualitative results on a diverse array of Internet videos.
Cite
Text
Ye et al. "Deformable Sprites for Unsupervised Video Decomposition." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.00268Markdown
[Ye et al. "Deformable Sprites for Unsupervised Video Decomposition." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/ye2022cvpr-deformable/) doi:10.1109/CVPR52688.2022.00268BibTeX
@inproceedings{ye2022cvpr-deformable,
title = {{Deformable Sprites for Unsupervised Video Decomposition}},
author = {Ye, Vickie and Li, Zhengqi and Tucker, Richard and Kanazawa, Angjoo and Snavely, Noah},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2022},
pages = {2657-2666},
doi = {10.1109/CVPR52688.2022.00268},
url = {https://mlanthology.org/cvpr/2022/ye2022cvpr-deformable/}
}