Summarizing Long-Length Videos with GAN-Enhanced Audio/Visual Features
Abstract
In this paper, we propose a novel supervised method for summarizing long-length videos. Many recent works presented successful results in video summarization. However, most videos in those works are short in duration (~5 minutes), and the methods often break down on very long videos (~30 minutes). Moreover, most works only use visual features, while audios provide useful features for the task. Based on these observations, we present a model that exploits both visual and audio features. To handle long videos, our model also refines the extracted features using adversarial networks. To demonstrate our model, we have collected a new dataset of 63 e-sports (~30 minutes) videos, each accompanied by an editorial summary video that is about 10% in length of the original video. Evaluation on this dataset suggests that our method produces quality summaries for very long videos.
Cite
Text
Lee and Lee. "Summarizing Long-Length Videos with GAN-Enhanced Audio/Visual Features." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00462Markdown
[Lee and Lee. "Summarizing Long-Length Videos with GAN-Enhanced Audio/Visual Features." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/lee2019iccvw-summarizing/) doi:10.1109/ICCVW.2019.00462BibTeX
@inproceedings{lee2019iccvw-summarizing,
title = {{Summarizing Long-Length Videos with GAN-Enhanced Audio/Visual Features}},
author = {Lee, Hansol and Lee, Gyemin},
booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
year = {2019},
pages = {3727-3731},
doi = {10.1109/ICCVW.2019.00462},
url = {https://mlanthology.org/iccvw/2019/lee2019iccvw-summarizing/}
}