Learning Multi-Attention Convolutional Neural Network for Fine-Grained Image Recognition
Abstract
Recognizing fine-grained categories (e.g., bird species) highly relies on discriminative part localization and part-based fine-grained feature learning. Existing approaches predominantly solve these challenges independently, while neglecting the fact that part localization (e.g., head of a bird) and fine-grained feature learning (e.g., head shape) are mutually correlated. In this paper, we propose a novel part learning approach by a multi-attention convolutional neural network (MA-CNN), where part generation and feature learning can reinforce each other. MA-CNN consists of convolution, channel grouping and part classification sub-networks. The channel grouping network takes as input feature channels from convolutional layers, and generates multiple parts by clustering, weighting and pooling from spatially-correlated channels. The part classification network further classifies an image by each individual part, through which more discriminative fine-grained features can be learned. Two losses are proposed to guide the multi-task learning of channel grouping and part classification, which encourages MA-CNN to generate more discriminative parts from feature channels and learn better fine-grained features from parts in a mutual reinforced way. MA-CNN does not need bounding box/part annotation and can be trained end-to-end. We incorporate the learned parts from MA-CNN with part-CNN for recognition, and show the best performances on three challenging published fine-grained datasets, e.g., CUB-Birds, FGVC-Aircraft and Stanford-Cars.
Cite
Text
Zheng et al. "Learning Multi-Attention Convolutional Neural Network for Fine-Grained Image Recognition." International Conference on Computer Vision, 2017. doi:10.1109/ICCV.2017.557Markdown
[Zheng et al. "Learning Multi-Attention Convolutional Neural Network for Fine-Grained Image Recognition." International Conference on Computer Vision, 2017.](https://mlanthology.org/iccv/2017/zheng2017iccv-learning-a/) doi:10.1109/ICCV.2017.557BibTeX
@inproceedings{zheng2017iccv-learning-a,
title = {{Learning Multi-Attention Convolutional Neural Network for Fine-Grained Image Recognition}},
author = {Zheng, Heliang and Fu, Jianlong and Mei, Tao and Luo, Jiebo},
booktitle = {International Conference on Computer Vision},
year = {2017},
doi = {10.1109/ICCV.2017.557},
url = {https://mlanthology.org/iccv/2017/zheng2017iccv-learning-a/}
}