Down-Sampling Inter-Layer Adapter for Parameter and Computation Efficient Ultra-Fine-Grained Image Recognition

Abstract

Ultra-fine-grained image recognition (UFGIR) categorizes objects with extremely small differences between classes, such as distinguishing between cultivars within the same species, as opposed to species-level classification in fine-grained image recognition (FGIR). The difficulty of this task is exacerbated due to the scarcity of samples per category. To tackle these challenges we introduce a novel approach employing down-sampling inter-layer adapters in a parameter-efficient setting, where the backbone parameters are frozen and we only fine-tune a small set of additional modules. By integrating dual-branch down-sampling, we significantly reduce the number of parameters and floating-point operations (FLOPs) required, making our method highly efficient. Comprehensive experiments on ten datasets demonstrate that our approach obtains outstanding accuracy-cost performance, highlighting its potential for practical applications in resource-constrained environments. In particular, our method increases the average accuracy by at least 6.8% compared to other methods in the parameter-efficient setting while requiring at least 123x less trainable parameters compared to current state-of-the-art UFGIR methods and reducing the FLOPs by 30% in average compared to other methods.

Cite

Text

Rios et al. "Down-Sampling Inter-Layer Adapter for Parameter and Computation Efficient Ultra-Fine-Grained Image Recognition." European Conference on Computer Vision Workshops, 2024. doi:10.1007/978-3-031-91979-4_5

Markdown

[Rios et al. "Down-Sampling Inter-Layer Adapter for Parameter and Computation Efficient Ultra-Fine-Grained Image Recognition." European Conference on Computer Vision Workshops, 2024.](https://mlanthology.org/eccvw/2024/rios2024eccvw-downsampling/) doi:10.1007/978-3-031-91979-4_5

BibTeX

@inproceedings{rios2024eccvw-downsampling,
  title     = {{Down-Sampling Inter-Layer Adapter for Parameter and Computation Efficient Ultra-Fine-Grained Image Recognition}},
  author    = {Rios, Edwin Arkel and Oyerinde, Femiloye and Hu, Min-Chun and Lai, Bo-Cheng},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2024},
  pages     = {43-54},
  doi       = {10.1007/978-3-031-91979-4_5},
  url       = {https://mlanthology.org/eccvw/2024/rios2024eccvw-downsampling/}
}