An Erudite Fine-Grained Visual Classification Model

Abstract

Current fine-grained visual classification (FGVC) models are isolated. In practice, we first need to identify the coarse-grained label of an object, then select the corresponding FGVC model for recognition. This hinders the application of the FGVC algorithm in real-life scenarios. In this paper, we propose an erudite FGVC model jointly trained by several different datasets, which can efficiently and accurately predict an object's fine-grained label across the combined label space. We found through a pilot study that positive and negative transfers co-occur when different datasets are mixed for training, i.e., the knowledge from other datasets is not always useful. Therefore, we first propose a feature disentanglement module and a feature re-fusion module to reduce negative transfer and boost positive transfer between different datasets. In detail, we reduce negative transfer by decoupling the deep features through many dataset-specific feature extractors. Subsequently, these are channel-wise re-fused to facilitate positive transfer. Finally, we propose a meta-learning based dataset-agnostic spatial attention layer to take full advantage of the multi-dataset training data, given that localisation is dataset-agnostic between different datasets. Experimental results across 11 different mixed-datasets built on four different FGVC datasets demonstrate the effectiveness of the proposed method. Furthermore, the proposed method can be easily combined with existing FGVC methods to obtain state-of-the-art results.

Cite

Text

Chang et al. "An Erudite Fine-Grained Visual Classification Model." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00702

Markdown

[Chang et al. "An Erudite Fine-Grained Visual Classification Model." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/chang2023cvpr-erudite/) doi:10.1109/CVPR52729.2023.00702

BibTeX

@inproceedings{chang2023cvpr-erudite,
  title     = {{An Erudite Fine-Grained Visual Classification Model}},
  author    = {Chang, Dongliang and Tong, Yujun and Du, Ruoyi and Hospedales, Timothy and Song, Yi-Zhe and Ma, Zhanyu},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {7268-7277},
  doi       = {10.1109/CVPR52729.2023.00702},
  url       = {https://mlanthology.org/cvpr/2023/chang2023cvpr-erudite/}
}