Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization

Abstract

The emergence of wearable devices such as portable cameras and smart glasses makes it possible to record life logging first-person videos. Browsing such long unstructured videos is time-consuming and tedious. This paper studies the discovery of moments of user's major or special interest (i.e., highlights) in a video, for generating the summarization of first-person videos. Specifically, we propose a novel pairwise deep ranking model that employs deep learning techniques to learn the relationship between highlight and non-highlight video segments. A two-stream network structure by representing video segments from complementary information on appearance of video frames and temporal dynamics across frames is developed for video highlight detection. Given a long personal video, equipped with the highlight detection model, a highlight score is assigned to each segment. The obtained highlight segments are applied for summarization in two ways: video timelapse and video skimming. The former plays the highlight (non-highlight) segments at low (high) speed rates, while the latter assembles the sequence of segments with the highest scores. On 100 hours of first-person videos for 15 unique sports categories, our highlight detection achieves the improvement over the state-of-the-art RankSVM method by 10.5% in terms of accuracy. Moreover, our approaches produce video summary with better quality by a user study from 35 human subjects.

Cite

Text

Yao et al. "Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization." Conference on Computer Vision and Pattern Recognition, 2016. doi:10.1109/CVPR.2016.112

Markdown

[Yao et al. "Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization." Conference on Computer Vision and Pattern Recognition, 2016.](https://mlanthology.org/cvpr/2016/yao2016cvpr-highlight/) doi:10.1109/CVPR.2016.112

BibTeX

@inproceedings{yao2016cvpr-highlight,
  title     = {{Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization}},
  author    = {Yao, Ting and Mei, Tao and Rui, Yong},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2016},
  doi       = {10.1109/CVPR.2016.112},
  url       = {https://mlanthology.org/cvpr/2016/yao2016cvpr-highlight/}
}