Editing like Humans: A Contextual, Multimodal Framework for Automated Video Editing

Koorathota, Sharath C.; Adelman, Patrick; Cotton, Kelly; Sajda, Paul

doi:10.1109/CVPRW53098.2021.00186

Editing like Humans: A Contextual, Multimodal Framework for Automated Video Editing

Sharath C. Koorathota, Patrick Adelman, Kelly Cotton, Paul Sajda

CVPRW 2021 pp. 1701-1709

doi:10.1109/CVPRW53098.2021.00186 /cvprw/2021/koorathota2021cvprw-editing/

Abstract

We propose an automated video editing model, which we term contextual and multimodal video editing (CMVE). The model leverages visual and textual metadata describing videos, integrating essential information from both modalities, and uses a learned editing style from a single example video to coherently combine clips. The editing model is useful for tasks such as generating news clip montages and highlight reels given a text query that describes the video storyline. The model exploits the perceptual similarity between video frames, objects in videos and text descriptions to emulate coherent video editing. Amazon Mechanical Turk participants made judgements comparing CMVE to expert human editing. Experimental results showed no significant difference in the CMVE vs human edited video in terms of matching the text query and the level of interest each generates, suggesting CMVE is able to effectively integrate semantic information across visual and textual modalities and create perceptually coherent quality videos typical of human video editors. We publicly release an online demonstration of our method.

PDF CVPRW Semantic Scholar

Cite

Text

Koorathota et al. "Editing like Humans: A Contextual, Multimodal Framework for Automated Video Editing." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2021. doi:10.1109/CVPRW53098.2021.00186

Markdown

[Koorathota et al. "Editing like Humans: A Contextual, Multimodal Framework for Automated Video Editing." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2021.](https://mlanthology.org/cvprw/2021/koorathota2021cvprw-editing/) doi:10.1109/CVPRW53098.2021.00186

BibTeX

@inproceedings{koorathota2021cvprw-editing,
  title     = {{Editing like Humans: A Contextual, Multimodal Framework for Automated Video Editing}},
  author    = {Koorathota, Sharath C. and Adelman, Patrick and Cotton, Kelly and Sajda, Paul},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2021},
  pages     = {1701-1709},
  doi       = {10.1109/CVPRW53098.2021.00186},
  url       = {https://mlanthology.org/cvprw/2021/koorathota2021cvprw-editing/}
}