LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation

Jiang, Wentao; Zhang, Jing; Wang, Di; Zhang, Qiming; Wang, Zengmao; Du, Bo

doi:10.24963/ijcai.2024/103

LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation

Wentao Jiang, Jing Zhang, Di Wang, Qiming Zhang, Zengmao Wang, Bo Du

IJCAI 2024 pp. 929-937

doi:10.24963/ijcai.2024/103 /ijcai/2024/jiang2024ijcai-lemevit/

Abstract

Ordinal regression bridges regression and classification by assigning objects to ordered classes. While human experts rely on discriminative patch-level features for decisions, current approaches are limited by the availability of only image-level ordinal labels, overlooking fine-grained patch-level characteristics. In this paper, we propose a Dual-level Fuzzy Learning with Patch Guidance framework, named DFPG that learns precise feature-based grading boundaries from ambiguous ordinal labels, with patch-level supervision. Specifically, we propose patch-labeling and filtering strategies to enable the model to focus on patch-level features exclusively with only image-level ordinal labels available. We further design a dual-level fuzzy learning module, which leverages fuzzy logic to quantitatively capture and handle label ambiguity from both patch-wise and channel-wise perspectives. Extensive experiments on various image ordinal regression datasets demonstrate the superiority of our proposed method, further confirming its ability in distinguishing samples from difficult-to-classify categories. The code is available at https://github.com/ZJUMAI/DFPG-ord.

PDF IJCAI Semantic Scholar

Cite

Text

Jiang et al. "LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation." International Joint Conference on Artificial Intelligence, 2024. doi:10.24963/ijcai.2024/103

Markdown

[Jiang et al. "LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation." International Joint Conference on Artificial Intelligence, 2024.](https://mlanthology.org/ijcai/2024/jiang2024ijcai-lemevit/) doi:10.24963/ijcai.2024/103

BibTeX

@inproceedings{jiang2024ijcai-lemevit,
  title     = {{LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation}},
  author    = {Jiang, Wentao and Zhang, Jing and Wang, Di and Zhang, Qiming and Wang, Zengmao and Du, Bo},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {929-937},
  doi       = {10.24963/ijcai.2024/103},
  url       = {https://mlanthology.org/ijcai/2024/jiang2024ijcai-lemevit/}
}