Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors

Abstract

Multi-scale features have been proven highly effective for object detection but often come with huge and even prohibitive extra computation costs, especially for the recent Transformer-based detectors. In this paper, we propose Iterative Multi-scale Feature Aggregation (IMFA) - a generic paradigm that enables efficient use of multi-scale features in Transformer-based object detectors. The core idea is to exploit sparse multi-scale features from just a few crucial locations, and it is achieved with two novel designs. First, IMFA rearranges the Transformer encoder-decoder pipeline so that the encoded features can be iteratively updated based on the detection predictions. Second, IMFA sparsely samples scale-adaptive features for refined detection from just a few keypoint locations under the guidance of prior detection predictions. As a result, the sampled multi-scale features are sparse yet still highly beneficial for object detection. Extensive experiments show that the proposed IMFA boosts the performance of multiple Transformer-based object detectors significantly yet with only slight computational overhead.

Cite

Text

Zhang et al. "Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00601

Markdown

[Zhang et al. "Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/zhang2023cvpr-efficient-b/) doi:10.1109/CVPR52729.2023.00601

BibTeX

@inproceedings{zhang2023cvpr-efficient-b,
  title     = {{Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors}},
  author    = {Zhang, Gongjie and Luo, Zhipeng and Tian, Zichen and Zhang, Jingyi and Zhang, Xiaoqin and Lu, Shijian},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {6206-6216},
  doi       = {10.1109/CVPR52729.2023.00601},
  url       = {https://mlanthology.org/cvpr/2023/zhang2023cvpr-efficient-b/}
}