Action Localization with Tubelets from Motion

Abstract

This paper considers the problem of action localization, where the objective is to determine when and where certain actions appear. We introduce a sampling strategy to produce 2D+t sequences of bounding boxes, called tubelets. Compared to state-of-the-art alternatives, this drastically reduces the number of hypotheses that are likely to include the action of interest. Our method is inspired by a recent technique introduced in the context of image localization. Beyond considering this technique for the first time for videos, we revisit this strategy for 2D+t sequences obtained from super-voxels. Our sampling strategy advantageously exploits a criterion that reflects how action related motion deviates from background motion. We demonstrate the interest of our approach by extensive experiments on two public datasets: UCF Sports and MSR-II. Our approach significantly outperforms the state-of-the-art on both datasets, while restricting the search of actions to a fraction of possible bounding box sequences.

Cite

Text

Jain et al. "Action Localization with Tubelets from Motion." Conference on Computer Vision and Pattern Recognition, 2014. doi:10.1109/CVPR.2014.100

Markdown

[Jain et al. "Action Localization with Tubelets from Motion." Conference on Computer Vision and Pattern Recognition, 2014.](https://mlanthology.org/cvpr/2014/jain2014cvpr-action/) doi:10.1109/CVPR.2014.100

BibTeX

@inproceedings{jain2014cvpr-action,
  title     = {{Action Localization with Tubelets from Motion}},
  author    = {Jain, Mihir and van Gemert, Jan and Jegou, Herve and Bouthemy, Patrick and Snoek, Cees G.M.},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2014},
  doi       = {10.1109/CVPR.2014.100},
  url       = {https://mlanthology.org/cvpr/2014/jain2014cvpr-action/}
}