Find Objects and Focus on Highlights: Mining Object Semantics for Video Highlight Detection via Graph Neural Networks

Abstract

With the increasing prevalence of portable computing devices, browsing unedited videos is time-consuming and tedious. Video highlight detection has the potential to significantly ease this situation, which discoveries moments of user's major or special interest in a video. Existing methods suffer from two problems. Firstly, most existing approaches only focus on learning holistic visual representations of videos but ignore object semantics for inferring video highlights. Secondly, current state-of-the-art approaches often adopt the pairwise ranking-based strategy, which cannot enjoy the global information to infer highlights. Therefore, we propose a novel video highlight framework, named VH-GNN, to construct an object-aware graph and model the relationships between objects from a global view. To reduce computational cost, we decompose the whole graph into two types of graphs: a spatial graph to capture the complex interactions of object within each frame, and a temporal graph to obtain object-aware representation of each frame and capture the global information. In addition, we optimize the framework via a proposed multi-stage loss, where the first stage aims to determine the highlight-probability and the second stage leverage the relationships between frames and focus on hard examples from the former stage. Extensive experiments on two standard datasets strongly evidence that VH-GNN obtains significant performance compared with state-of-the-arts.

Cite

Text

Zhang et al. "Find Objects and Focus on Highlights: Mining Object Semantics for Video Highlight Detection via Graph Neural Networks." AAAI Conference on Artificial Intelligence, 2020. doi:10.1609/AAAI.V34I07.6988

Markdown

[Zhang et al. "Find Objects and Focus on Highlights: Mining Object Semantics for Video Highlight Detection via Graph Neural Networks." AAAI Conference on Artificial Intelligence, 2020.](https://mlanthology.org/aaai/2020/zhang2020aaai-find/) doi:10.1609/AAAI.V34I07.6988

BibTeX

@inproceedings{zhang2020aaai-find,
  title     = {{Find Objects and Focus on Highlights: Mining Object Semantics for Video Highlight Detection via Graph Neural Networks}},
  author    = {Zhang, Yingying and Gao, Junyu and Yang, Xiaoshan and Liu, Chang and Li, Yan and Xu, Changsheng},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2020},
  pages     = {12902-12909},
  doi       = {10.1609/AAAI.V34I07.6988},
  url       = {https://mlanthology.org/aaai/2020/zhang2020aaai-find/}
}