Where to Focus: Investigating Hierarchical Attention Relationship for Fine-Grained Visual Classification

Abstract

Object categories are often grouped into a multi-granularity taxonomic hierarchy. Classifying objects at coarser-grained hierarchy requires global and common characteristics, while finer-grained hierarchy classification relies on local and discriminative features. Therefore, humans should also subconsciously focus on different object regions when classifying different hierarchies. This granularity-wise attention is confirmed by our collected human real-time gaze data on different hierarchy classifications. To leverage this mechanism, we propose a Cross-Hierarchical Region Feature (CHRF) learning framework. Specifically, we first design a region feature mining module that imitates humans to learn different granularity-wise attention regions with multi-grained classification tasks. To explore how human attention shifts from one hierarchy to another, we further present a cross-hierarchical orthogonal fusion module to enhance the region feature representation by blending the original feature and an orthogonal component extracted from adjacent hierarchies. Experiments on five hierarchical fine-grained datasets demonstrate the effectiveness of CHRF compared with the state-of-the-art methods. Ablation study and visualization results also consistently verify the advantages of our human attention-oriented modules. The code and dataset are available at https://github.com/visiondom/CHRF.

Cite

Text

Liu et al. "Where to Focus: Investigating Hierarchical Attention Relationship for Fine-Grained Visual Classification." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-20053-3_4

Markdown

[Liu et al. "Where to Focus: Investigating Hierarchical Attention Relationship for Fine-Grained Visual Classification." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/liu2022eccv-focus/) doi:10.1007/978-3-031-20053-3_4

BibTeX

@inproceedings{liu2022eccv-focus,
  title     = {{Where to Focus: Investigating Hierarchical Attention Relationship for Fine-Grained Visual Classification}},
  author    = {Liu, Yang and Zhou, Lei and Zhang, Pengcheng and Bai, Xiao and Gu, Lin and Yu, Xiaohan and Zhou, Jun and Hancock, Edwin R.},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2022},
  doi       = {10.1007/978-3-031-20053-3_4},
  url       = {https://mlanthology.org/eccv/2022/liu2022eccv-focus/}
}