Scene Classification with Semantic Fisher Vectors

Mandar Dixit, Si Chen, Dashan Gao, Nikhil Rasiwasia, Nuno Vasconcelos

CVPR 2015

doi:10.1109/CVPR.2015.7298916 /cvpr/2015/dixit2015cvpr-scene/

Abstract

With the help of a convolutional neural network~(CNN) trained to recognize objects, a scene image is represented as a bag of semantics (BoS). This involves classifying image patches using the network and considering the class posterior probability vectors as locally extracted semantic descriptors. The image BoS is summarized using a Fisher vector~(FV) embedding that exploits the properties of the space of these descriptors. The resulting representation is referred to as a semantic Fisher vector. Two implementations of a semantic FV are investigated. First involves modeling the BoS with a Dirichlet Mixture and computing the Fisher gradients for this model. Due to the difficulty of mixture modeling on a non-Euclidean probability simplex, this approach is shown to be unsuccessful. A second implementation is derived using the interpretation of semantic descriptors as parameters of a multinomial distribution. Like the parameters of any exponential family, these can be projected into their natural parameter space. For a CNN, this is shown equivalent to using inputs of its soft-max layer as patch descriptors. A semantic FV is then computed as a Gaussian Mixture FV in the space of these natural parameters. This representation is shown to outperform other alternatives such as FVs of features from the intermediate CNN layers or a classifier obtained by adapting (fine-tuning) the CNN. The proposed FV represents an embedding for object classification probabilities. As an image representation, therefore, it is complementary to the features obtained from a scene classification CNN. A combination of the two representations is shown to achieve state-of-the-art results on MIT Indoor scenes and SUN datasets.

PDF CVPR Semantic Scholar

Cite

Text

Dixit et al. "Scene Classification with Semantic Fisher Vectors." Conference on Computer Vision and Pattern Recognition, 2015. doi:10.1109/CVPR.2015.7298916

Markdown

[Dixit et al. "Scene Classification with Semantic Fisher Vectors." Conference on Computer Vision and Pattern Recognition, 2015.](https://mlanthology.org/cvpr/2015/dixit2015cvpr-scene/) doi:10.1109/CVPR.2015.7298916

BibTeX

@inproceedings{dixit2015cvpr-scene,
  title     = {{Scene Classification with Semantic Fisher Vectors}},
  author    = {Dixit, Mandar and Chen, Si and Gao, Dashan and Rasiwasia, Nikhil and Vasconcelos, Nuno},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2015},
  doi       = {10.1109/CVPR.2015.7298916},
  url       = {https://mlanthology.org/cvpr/2015/dixit2015cvpr-scene/}
}