VLAD Is Not Necessary for CNN
Abstract
Global convolutional neural networks (CNNs) activations lack geometric invariance, and in order to address this problem, Gong et al. proposed multi-scale orderless pooling(MOP-CNN), which extracts CNN activations for local patches at multiple scale levels, and performs orderless VLAD pooling to extract features. However, we find that this method can improve the performance mainly because it extracts global and local representation simultaneously, and VLAD pooling is not necessary as the representations extracted by CNN is good enough for classification. In this paper, we propose a new method to extract multi-scale features of CNNs, leading to a new structure of deep learning. The method extracts CNN representations for local patches at multiple scale levels, then concatenates all the representations at each level separately, finally, concatenates the results of all levels. The CNN is trained on the ImageNet dataset to extract features and it is then transferred to other datasets. The experimental results obtained on the databases MITIndoor and Caltech-101 show that the performance of our proposed method is superior to the MOP-CNN.
Cite
Text
Yu and Wu. "VLAD Is Not Necessary for CNN." European Conference on Computer Vision Workshops, 2016. doi:10.1007/978-3-319-49409-8_41Markdown
[Yu and Wu. "VLAD Is Not Necessary for CNN." European Conference on Computer Vision Workshops, 2016.](https://mlanthology.org/eccvw/2016/yu2016eccvw-vlad/) doi:10.1007/978-3-319-49409-8_41BibTeX
@inproceedings{yu2016eccvw-vlad,
title = {{VLAD Is Not Necessary for CNN}},
author = {Yu, Dan and Wu, Xiao-Jun},
booktitle = {European Conference on Computer Vision Workshops},
year = {2016},
pages = {492-499},
doi = {10.1007/978-3-319-49409-8_41},
url = {https://mlanthology.org/eccvw/2016/yu2016eccvw-vlad/}
}