Mr. HiSum: A Large-Scale Dataset for Video Highlight Detection and Summarization
Abstract
Video highlight detection is a task to automatically select the most engaging moments from a long video. This problem is highly challenging since it aims to learn a general way of finding highlights from a variety of videos in the real world.The task has an innate subjectivity because the definition of a highlight differs across individuals. Therefore, to detect consistent and meaningful highlights, prior benchmark datasets have been labeled by multiple (5-20) raters. Due to the high cost of manual labeling, most existing public benchmarks are in extremely small scale, containing only a few tens or hundreds of videos. This insufficient benchmark scale causes multiple issues such as unstable evaluation or high sensitivity in traintest splits. We present Mr. HiSum, a large-scale dataset for video highlight detection and summarization, containing 31,892 videos and reliable labels aggregated over 50,000+ users per video. We empirically prove reliability of the labels as frame importance by cross-dataset transfer and user study.
Cite
Text
Sul et al. "Mr. HiSum: A Large-Scale Dataset for Video Highlight Detection and Summarization." Neural Information Processing Systems, 2023.Markdown
[Sul et al. "Mr. HiSum: A Large-Scale Dataset for Video Highlight Detection and Summarization." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/sul2023neurips-mr/)BibTeX
@inproceedings{sul2023neurips-mr,
title = {{Mr. HiSum: A Large-Scale Dataset for Video Highlight Detection and Summarization}},
author = {Sul, Jinhwan and Han, Jihoon and Lee, Joonseok},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/sul2023neurips-mr/}
}