Mutual Effort for Efficiency: A Similarity-Based Token Pruning for Vision Transformers in Self-Supervised Learning
Abstract
Self-supervised learning (SSL) offers a compelling solution to the challenge of extensive labeled data requirements in traditional supervised learning. With the proven success of Vision Transformers (ViTs) in supervised tasks, there is increasing interest in adapting them for SSL frameworks. However, the high computational demands of SSL pose substantial challenges, particularly on resource-limited platforms like edge devices, despite its ability to achieve high accuracy without labeled data. Recent studies in supervised learning have shown that token pruning can reduce training costs by removing less informative tokens without compromising accuracy. However, SSL’s dual-branch encoders make traditional single-branch pruning strategies less effective, as they fail to account for the critical cross-branch similarity information, leading to reduced accuracy in SSL. To this end, we introduce SimPrune, a novel token pruning strategy designed for ViTs in SSL. SimPrune leverages cross-branch similarity information to efficiently prune tokens, retaining essential semantic information across dual branches. Additionally, we incorporate a difficulty-aware pruning strategy to further enhance SimPrune's effectiveness. Experimental results show that our proposed approach effectively reduces training computation while maintaining accuracy. Specifically, our approach offers 24\% savings in training costs compared to SSL baseline, without sacrificing accuracy.
Cite
Text
Li et al. "Mutual Effort for Efficiency: A Similarity-Based Token Pruning for Vision Transformers in Self-Supervised Learning." International Conference on Learning Representations, 2025.Markdown
[Li et al. "Mutual Effort for Efficiency: A Similarity-Based Token Pruning for Vision Transformers in Self-Supervised Learning." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/li2025iclr-mutual/)BibTeX
@inproceedings{li2025iclr-mutual,
title = {{Mutual Effort for Efficiency: A Similarity-Based Token Pruning for Vision Transformers in Self-Supervised Learning}},
author = {Li, Sheng and Tan, Qitao and Dai, Yue and Kong, Zhenglun and Wang, Tianyu and Liu, Jun and Li, Ao and Liu, Ninghao and Ding, Yufei and Tang, Xulong and Yuan, Geng},
booktitle = {International Conference on Learning Representations},
year = {2025},
url = {https://mlanthology.org/iclr/2025/li2025iclr-mutual/}
}