Learning Where to Cut from Edited Videos
Abstract
In this work we propose a new approach for accelerating the video editing process by identifying good moments in time to cut unedited videos. We first validate that there is indeed a consensus among human viewers about good and bad cut moments with a user study, and then formulate this problem as a classification task. In order to train for such a task, we propose a self-supervised scheme that only requires pre-existing edited videos for training, of which there is large and diverse data readily available. We then propose a contrastive learning framework to train a 3D ResNet model to predict good regions to cut. We validate our method with a second user study, which indicates that clips generated by our model are preferred over a number of baselines.
Cite
Text
Huang et al. "Learning Where to Cut from Edited Videos." IEEE/CVF International Conference on Computer Vision Workshops, 2021. doi:10.1109/ICCVW54120.2021.00360Markdown
[Huang et al. "Learning Where to Cut from Edited Videos." IEEE/CVF International Conference on Computer Vision Workshops, 2021.](https://mlanthology.org/iccvw/2021/huang2021iccvw-learning/) doi:10.1109/ICCVW54120.2021.00360BibTeX
@inproceedings{huang2021iccvw-learning,
title = {{Learning Where to Cut from Edited Videos}},
author = {Huang, Yuzhong and Bai, Xue and Wang, Oliver and Caba, Fabian and Agarwala, Aseem},
booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
year = {2021},
pages = {3208-3216},
doi = {10.1109/ICCVW54120.2021.00360},
url = {https://mlanthology.org/iccvw/2021/huang2021iccvw-learning/}
}