Effective Codebooks for Human Action Categorization

Abstract

In this paper we propose a new method for human action categorization by using an effective combination of novel gradient and optic flow descriptors, and creating a more effective codebook modeling the ambiguity of feature assignment in the traditional bag-of-words model. Recent approaches have represented video sequences using a bag of spatio-temporal visual words, following the successful results achieved in object and scene classification. Codebooks are usually obtained by k-means clustering and hard assignment of visual features to the best representing codeword. Our main contribution is two-fold. First, we define a new 3D gradient descriptor that combined with optic flow outperforms the state-of-the-art, without requiring fine parameter tuning. Second, we show that for spatio-temporal features the popular k-means algorithm is insufficient because cluster centers are attracted by the denser regions of the sample distribution, providing a non-uniform description of the feature space and thus failing to code other informative regions. Therefore, we apply a radius-based clustering method and a soft assignment that considers the information of two or more relevant candidates. This approach generates a more effective codebook resulting in a further improvement of classification performances. We extensively test our approach on standard KTH and Weizmann action datasets showing its validity and outperforming other recent approaches.

Cite

Text

Ballan et al. "Effective Codebooks for Human Action Categorization." IEEE/CVF International Conference on Computer Vision Workshops, 2009. doi:10.1109/ICCVW.2009.5457658

Markdown

[Ballan et al. "Effective Codebooks for Human Action Categorization." IEEE/CVF International Conference on Computer Vision Workshops, 2009.](https://mlanthology.org/iccvw/2009/ballan2009iccvw-effective/) doi:10.1109/ICCVW.2009.5457658

BibTeX

@inproceedings{ballan2009iccvw-effective,
  title     = {{Effective Codebooks for Human Action Categorization}},
  author    = {Ballan, Lamberto and Bertini, Marco and Del Bimbo, Alberto and Seidenari, Lorenzo and Serra, Giuseppe},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2009},
  pages     = {506-513},
  doi       = {10.1109/ICCVW.2009.5457658},
  url       = {https://mlanthology.org/iccvw/2009/ballan2009iccvw-effective/}
}