Joint Inference of Groups, Events and Human Roles in Aerial Videos

Abstract

With the advent of drones, aerial video analysis becomes increasingly important; yet, it has received scant attention in the literature. This paper addresses a new problem of parsing low-resolution aerial videos of large spatial areas, in terms of 1) grouping, 2) recognizing events and 3) assigning roles to people engaged in events. We propose a novel framework aimed at conducting joint inference of the above tasks, as reasoning about each in isolation typically fails in our setting. Given noisy tracklets of people and detections of large objects and scene surfaces (e.g., building, grass), we use a spatiotemporal AND-OR graph to drive our joint inference, using Markov Chain Monte Carlo and dynamic programming. We also introduce a new formalism of spatiotemporal templates characterizing latent sub-events. For evaluation, we have collected and released a new aerial videos dataset using a hex-rotor flying over picnic areas rich with group events. Our results demonstrate that we successfully address above inference tasks under challenging conditions.

Cite

Text

Shu et al. "Joint Inference of Groups, Events and Human Roles in Aerial Videos." Conference on Computer Vision and Pattern Recognition, 2015. doi:10.1109/CVPR.2015.7299088

Markdown

[Shu et al. "Joint Inference of Groups, Events and Human Roles in Aerial Videos." Conference on Computer Vision and Pattern Recognition, 2015.](https://mlanthology.org/cvpr/2015/shu2015cvpr-joint/) doi:10.1109/CVPR.2015.7299088

BibTeX

@inproceedings{shu2015cvpr-joint,
  title     = {{Joint Inference of Groups, Events and Human Roles in Aerial Videos}},
  author    = {Shu, Tianmin and Xie, Dan and Rothrock, Brandon and Todorovic, Sinisa and Zhu, Song Chun},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2015},
  doi       = {10.1109/CVPR.2015.7299088},
  url       = {https://mlanthology.org/cvpr/2015/shu2015cvpr-joint/}
}