Not All Tokens Matter All the Time: Dynamic Token Aggregation Towards Efficient Detection Transformers
Abstract
The substantial computational demands of detection transformers (DETRs) hinder their deployment in resource-constrained scenarios, with the encoder consistently emerging as a critical bottleneck. A promising solution lies in reducing token redundancy within the encoder. However, existing methods perform static sparsification while ignoring the varying importance of tokens across different levels and encoder blocks for object detection, leading to suboptimal sparsification and performance degradation. In this paper, we propose Dynamic DETR (Dynamic token aggregation for DEtection TRansformers), a novel strategy that leverages inherent importance distribution to control token density and performs multi-level token sparsification. Within each stage, we apply a proximal aggregation paradigm for low-level tokens to maintain spatial integrity, and a holistic strategy for high-level tokens to capture broader contextual information. Furthermore, we propose center-distance regularization to align the distribution of tokens throughout the sparsification process, thereby facilitating the representation consistency and effectively preserving critical object-specific patterns. Extensive experiments on canonical DETR models demonstrate that Dynamic DETR is broadly applicable across various models and consistently outperforms existing token sparsification methods.
Cite
Text
Cheng et al. "Not All Tokens Matter All the Time: Dynamic Token Aggregation Towards Efficient Detection Transformers." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Cheng et al. "Not All Tokens Matter All the Time: Dynamic Token Aggregation Towards Efficient Detection Transformers." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/cheng2025icml-all/)BibTeX
@inproceedings{cheng2025icml-all,
title = {{Not All Tokens Matter All the Time: Dynamic Token Aggregation Towards Efficient Detection Transformers}},
author = {Cheng, Jiacheng and Yao, Xiwen and Yuan, Xiang and Han, Junwei},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {10144-10158},
volume = {267},
url = {https://mlanthology.org/icml/2025/cheng2025icml-all/}
}