Cost-Sensitive Top-Down/Bottom-up Inference for Multiscale Activity Recognition

Abstract

This paper addresses a new problem, that of multiscale activity recognition. Our goal is to detect and localize a wide range of activities, including individual actions and group activities, which may simultaneously co-occur in high-resolution video. The video resolution allows for digital zoom-in (or zoom-out) for examining fine details (or coarser scales), as needed for recognition. The key challenge is how to avoid running a multitude of detectors at all spatiotemporal scales, and yet arrive at a holistically consistent video interpretation. To this end, we use a three-layered AND-OR graph to jointly model group activities, individual actions, and participating objects. The AND-OR graph allows a principled formulation of efficient, cost-sensitive inference via an explore-exploit strategy. Our inference optimally schedules the following computational processes: 1) direct application of activity detectors – called α process; 2) bottom-up inference based on detecting activity parts – called β process; and 3) top-down inference based on detecting activity context – called γ process. The scheduling iteratively maximizes the log-posteriors of the resulting parse graphs. For evaluation, we have compiled and benchmarked a new dataset of high-resolution videos of group and individual activities co-occurring in a courtyard of the UCLA campus.

Cite

Text

Amer et al. "Cost-Sensitive Top-Down/Bottom-up Inference for Multiscale Activity Recognition." European Conference on Computer Vision, 2012. doi:10.1007/978-3-642-33765-9_14

Markdown

[Amer et al. "Cost-Sensitive Top-Down/Bottom-up Inference for Multiscale Activity Recognition." European Conference on Computer Vision, 2012.](https://mlanthology.org/eccv/2012/amer2012eccv-cost/) doi:10.1007/978-3-642-33765-9_14

BibTeX

@inproceedings{amer2012eccv-cost,
  title     = {{Cost-Sensitive Top-Down/Bottom-up Inference for Multiscale Activity Recognition}},
  author    = {Amer, Mohamed R. and Xie, Dan and Zhao, Mingtian and Todorovic, Sinisa and Zhu, Song Chun},
  booktitle = {European Conference on Computer Vision},
  year      = {2012},
  pages     = {187-200},
  doi       = {10.1007/978-3-642-33765-9_14},
  url       = {https://mlanthology.org/eccv/2012/amer2012eccv-cost/}
}