Walking and Talking: A Bilinear Approach to Multi-Label Action Recognition

Abstract

Action recognition is a fundamental problem in computer vision. However, all the current approaches pose the problem in a multi-class setting, where each actor is modeled as performing a single action at a time. In this work we pose the action recognition as a multi-label problem, i.e., an actor can be performing any plausible subset of actions. Determining which subsets of labels can co-occur is typically treated as a separate problem, typically modeled sparsely or fixed apriori to label correlation coefficients. In contrast, we formulate multi-label training and label correlation estimation as a joint max-margin bilinear classification problem. Our joint approach effectively trains discriminative bilinear classifiers that leverage label correlations. To evaluate our approach we relabeled the UCLA Courtyard dataset for the multi-label setting. We demonstrate that our joint model outperforms baselines on the same task and report state-of-the-art per-label accuracies on the dataset.

Cite

Text

Khamis and Davis. "Walking and Talking: A Bilinear Approach to Multi-Label Action Recognition." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2015. doi:10.1109/CVPRW.2015.7301277

Markdown

[Khamis and Davis. "Walking and Talking: A Bilinear Approach to Multi-Label Action Recognition." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2015.](https://mlanthology.org/cvprw/2015/khamis2015cvprw-walking/) doi:10.1109/CVPRW.2015.7301277

BibTeX

@inproceedings{khamis2015cvprw-walking,
  title     = {{Walking and Talking: A Bilinear Approach to Multi-Label Action Recognition}},
  author    = {Khamis, Sameh and Davis, Larry S.},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2015},
  pages     = {1-8},
  doi       = {10.1109/CVPRW.2015.7301277},
  url       = {https://mlanthology.org/cvprw/2015/khamis2015cvprw-walking/}
}