Boosting VLAD with Supervised Dictionary Learning and High-Order Statistics

Abstract

Recent studies show that aggregating local descriptors into super vector yields effective representation for retrieval and classification tasks. A popular method along this line is vector of locally aggregated descriptors (VLAD), which aggregates the residuals between descriptors and visual words. However, original VLAD ignores high-order statistics of local descriptors and its dictionary may not be optimal for classification tasks. In this paper, we address these problems by utilizing high-order statistics of local descriptors and peforming supervised dictionary learning. The main contributions are twofold. Firstly, we propose a high-order VLAD (H-VLAD) for visual recognition, which leverages two kinds of high-order statistics in the VLAD-like framework, namely diagonal covariance and skewness. These high-order statistics provide complementary information for VLAD and allow for efficient computation. Secondly, to further boost the performance of H-VLAD, we design a supervised dictionary learning algorithm to discriminatively refine the dictionary, which can be also extended for other super vector based encoding methods. We examine the effectiveness of our methods in image-based object categorization and video-based action recognition. Extensive experiments on PASCAL VOC 2007, HMDB51, and UCF101 datasets exhibit that our method achieves the state-of-the-art performance on both tasks.

Cite

Text

Peng et al. "Boosting VLAD with Supervised Dictionary Learning and High-Order Statistics." European Conference on Computer Vision, 2014. doi:10.1007/978-3-319-10578-9_43

Markdown

[Peng et al. "Boosting VLAD with Supervised Dictionary Learning and High-Order Statistics." European Conference on Computer Vision, 2014.](https://mlanthology.org/eccv/2014/peng2014eccv-boosting/) doi:10.1007/978-3-319-10578-9_43

BibTeX

@inproceedings{peng2014eccv-boosting,
  title     = {{Boosting VLAD with Supervised Dictionary Learning and High-Order Statistics}},
  author    = {Peng, Xiaojiang and Wang, Limin and Qiao, Yu and Peng, Qiang},
  booktitle = {European Conference on Computer Vision},
  year      = {2014},
  pages     = {660-674},
  doi       = {10.1007/978-3-319-10578-9_43},
  url       = {https://mlanthology.org/eccv/2014/peng2014eccv-boosting/}
}