Category-Aware Allocation Transformer for Weakly Supervised Object Localization
Abstract
Weakly supervised object localization (WSOL) aims to localize objects based on only image-level labels as supervision. Recently, transformers have been introduced into WSOL, yielding impressive results. The self-attention mechanism and multilayer perceptron structure in transformers preserve long-range feature dependency, facilitating complete localization of the full object extent. However, current transformer-based methods predict bounding boxes using category-agnostic attention maps, which may lead to confused and noisy object localization. To address this issue, we propose a novel Category-aware Allocation TRansformer (CATR) that learns category-aware representations for specific objects and produces corresponding category-aware attention maps for object localization. First, we introduce a Category-aware Stimulation Module (CSM) to induce learnable category biases for self-attention maps, providing auxiliary supervision to guide the learning of more effective transformer representations. Second, we design an Object Constraint Module (OCM) to refine the object regions for the category-aware attention maps in a self-supervised manner. Extensive experiments on the CUB-200-2011 and ILSVRC datasets demonstrate that the proposed CATR achieves significant and consistent performance improvements over competing approaches.
Cite
Text
Chen et al. "Category-Aware Allocation Transformer for Weakly Supervised Object Localization." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.00611Markdown
[Chen et al. "Category-Aware Allocation Transformer for Weakly Supervised Object Localization." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/chen2023iccv-categoryaware/) doi:10.1109/ICCV51070.2023.00611BibTeX
@inproceedings{chen2023iccv-categoryaware,
title = {{Category-Aware Allocation Transformer for Weakly Supervised Object Localization}},
author = {Chen, Zhiwei and Ding, Jinren and Cao, Liujuan and Shen, Yunhang and Zhang, Shengchuan and Jiang, Guannan and Ji, Rongrong},
booktitle = {International Conference on Computer Vision},
year = {2023},
pages = {6643-6652},
doi = {10.1109/ICCV51070.2023.00611},
url = {https://mlanthology.org/iccv/2023/chen2023iccv-categoryaware/}
}