Agglomerative Token Clustering
Abstract
We present Agglomerative Token Clustering (ATC), a novel token merging method that consistently outperforms previous token merging and pruning methods across image classification, image synthesis, and object detection & segmentation tasks. ATC merges clusters through bottom-up hierarchical clustering, without the introduction of extra learnable parameters. We find that ATC achieves state-of-the-art performance across all tasks, and can even perform on par with prior state-of-the-art when applied off-the-shelf, without fine-tuning. ATC is particularly effective when applied with low keep rates, where only a small fraction of tokens are kept and retaining task performance is especially difficult.
Cite
Text
Haurum et al. "Agglomerative Token Clustering." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72998-0_12Markdown
[Haurum et al. "Agglomerative Token Clustering." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/haurum2024eccv-agglomerative/) doi:10.1007/978-3-031-72998-0_12BibTeX
@inproceedings{haurum2024eccv-agglomerative,
title = {{Agglomerative Token Clustering}},
author = {Haurum, Joakim Bruslund and Escalera, Sergio and Taylor, Graham W. and Moeslund, Thomas B.},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
doi = {10.1007/978-3-031-72998-0_12},
url = {https://mlanthology.org/eccv/2024/haurum2024eccv-agglomerative/}
}