Editing like Humans: A Contextual, Multimodal Framework for Automated Video Editing
Abstract
We propose an automated video editing model, which we term contextual and multimodal video editing (CMVE). The model leverages visual and textual metadata describing videos, integrating essential information from both modalities, and uses a learned editing style from a single example video to coherently combine clips. The editing model is useful for tasks such as generating news clip montages and highlight reels given a text query that describes the video storyline. The model exploits the perceptual similarity between video frames, objects in videos and text descriptions to emulate coherent video editing. Amazon Mechanical Turk participants made judgements comparing CMVE to expert human editing. Experimental results showed no significant difference in the CMVE vs human edited video in terms of matching the text query and the level of interest each generates, suggesting CMVE is able to effectively integrate semantic information across visual and textual modalities and create perceptually coherent quality videos typical of human video editors. We publicly release an online demonstration of our method.
Cite
Text
Koorathota et al. "Editing like Humans: A Contextual, Multimodal Framework for Automated Video Editing." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2021. doi:10.1109/CVPRW53098.2021.00186Markdown
[Koorathota et al. "Editing like Humans: A Contextual, Multimodal Framework for Automated Video Editing." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2021.](https://mlanthology.org/cvprw/2021/koorathota2021cvprw-editing/) doi:10.1109/CVPRW53098.2021.00186BibTeX
@inproceedings{koorathota2021cvprw-editing,
title = {{Editing like Humans: A Contextual, Multimodal Framework for Automated Video Editing}},
author = {Koorathota, Sharath C. and Adelman, Patrick and Cotton, Kelly and Sajda, Paul},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2021},
pages = {1701-1709},
doi = {10.1109/CVPRW53098.2021.00186},
url = {https://mlanthology.org/cvprw/2021/koorathota2021cvprw-editing/}
}