Cross-X Learning for Fine-Grained Visual Categorization

Abstract

Recognizing objects from subcategories with very subtle differences remains a challenging task due to the large intra-class and small inter-class variation. Recent work tackles this problem in a weakly-supervised manner: object parts are first detected and the corresponding part-specific features are extracted for fine-grained classification. However, these methods typically treat the part-specific features of each image in isolation while neglecting their relationships between different images. In this paper, we propose Cross-X learning, a simple yet effective approach that exploits the relationships between different images and between different network layers for robust multi-scale feature learning. Our approach involves two novel components: (i) a cross-category cross-semantic regularizer that guides the extracted features to represent semantic parts and, (ii) a cross-layer regularizer that improves the robustness of multi-scale features by matching the prediction distribution across multiple layers. Our approach can be easily trained end-to-end and is scalable to large datasets like NABirds. We empirically analyze the contributions of different components of our approach and demonstrate its robustness, effectiveness and state-of-the-art performance on five benchmark datasets. Code is available at https://github.com/cswluo/CrossX.

Cite

Text

Luo et al. "Cross-X Learning for Fine-Grained Visual Categorization." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019. doi:10.1109/ICCV.2019.00833

Markdown

[Luo et al. "Cross-X Learning for Fine-Grained Visual Categorization." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.](https://mlanthology.org/iccv/2019/luo2019iccv-crossx/) doi:10.1109/ICCV.2019.00833

BibTeX

@inproceedings{luo2019iccv-crossx,
  title     = {{Cross-X Learning for Fine-Grained Visual Categorization}},
  author    = {Luo, Wei and Yang, Xitong and Mo, Xianjie and Lu, Yuheng and Davis, Larry S. and Li, Jun and Yang, Jian and Lim, Ser-Nam},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year      = {2019},
  doi       = {10.1109/ICCV.2019.00833},
  url       = {https://mlanthology.org/iccv/2019/luo2019iccv-crossx/}
}