Learning Where to Cut from Edited Videos

Abstract

In this work we propose a new approach for accelerating the video editing process by identifying good moments in time to cut unedited videos. We first validate that there is indeed a consensus among human viewers about good and bad cut moments with a user study, and then formulate this problem as a classification task. In order to train for such a task, we propose a self-supervised scheme that only requires pre-existing edited videos for training, of which there is large and diverse data readily available. We then propose a contrastive learning framework to train a 3D ResNet model to predict good regions to cut. We validate our method with a second user study, which indicates that clips generated by our model are preferred over a number of baselines.

Cite

Text

Huang et al. "Learning Where to Cut from Edited Videos." IEEE/CVF International Conference on Computer Vision Workshops, 2021. doi:10.1109/ICCVW54120.2021.00360

Markdown

[Huang et al. "Learning Where to Cut from Edited Videos." IEEE/CVF International Conference on Computer Vision Workshops, 2021.](https://mlanthology.org/iccvw/2021/huang2021iccvw-learning/) doi:10.1109/ICCVW54120.2021.00360

BibTeX

@inproceedings{huang2021iccvw-learning,
  title     = {{Learning Where to Cut from Edited Videos}},
  author    = {Huang, Yuzhong and Bai, Xue and Wang, Oliver and Caba, Fabian and Agarwala, Aseem},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2021},
  pages     = {3208-3216},
  doi       = {10.1109/ICCVW54120.2021.00360},
  url       = {https://mlanthology.org/iccvw/2021/huang2021iccvw-learning/}
}