Actor-Action Semantic Segmentation with Grouping Process Models

Abstract

Actor-action semantic segmentation made an important step toward advanced video understanding: what action is happening; who is performing the action; and where is the action happening in space-time. Current methods based on layered CRFs for this problem are local and unable to capture the long-ranging interactions of video parts. We propose a new model that combines the labeling CRF with a supervoxel hierarchy, where supervoxels at various scales provide cues for possible groupings of nodes in the CRF to encourage adaptive and long-ranging interactions. The new model defines a dynamic and continuous process of information exchange: the CRF influences what supervoxels in the hierarchy are active, and these active supervoxels, in turn, affect the connectivities in the CRF; we hence call it a grouping process model. By further incorporating the video-level recognition, the proposed method achieves a large margin of 60% relative improvement over the state of the art on the recent A2D large-scale video labeling dataset, which demonstrates the effectiveness of our modeling.

Cite

Text

Xu and Corso. "Actor-Action Semantic Segmentation with Grouping Process Models." Conference on Computer Vision and Pattern Recognition, 2016. doi:10.1109/CVPR.2016.336

Markdown

[Xu and Corso. "Actor-Action Semantic Segmentation with Grouping Process Models." Conference on Computer Vision and Pattern Recognition, 2016.](https://mlanthology.org/cvpr/2016/xu2016cvpr-actoraction/) doi:10.1109/CVPR.2016.336

BibTeX

@inproceedings{xu2016cvpr-actoraction,
  title     = {{Actor-Action Semantic Segmentation with Grouping Process Models}},
  author    = {Xu, Chenliang and Corso, Jason J.},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2016},
  doi       = {10.1109/CVPR.2016.336},
  url       = {https://mlanthology.org/cvpr/2016/xu2016cvpr-actoraction/}
}