Agglomerative Token Clustering

Abstract

We present Agglomerative Token Clustering (ATC), a novel token merging method that consistently outperforms previous token merging and pruning methods across image classification, image synthesis, and object detection & segmentation tasks. ATC merges clusters through bottom-up hierarchical clustering, without the introduction of extra learnable parameters. We find that ATC achieves state-of-the-art performance across all tasks, and can even perform on par with prior state-of-the-art when applied off-the-shelf, without fine-tuning. ATC is particularly effective when applied with low keep rates, where only a small fraction of tokens are kept and retaining task performance is especially difficult.

Cite

Text

Haurum et al. "Agglomerative Token Clustering." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72998-0_12

Markdown

[Haurum et al. "Agglomerative Token Clustering." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/haurum2024eccv-agglomerative/) doi:10.1007/978-3-031-72998-0_12

BibTeX

@inproceedings{haurum2024eccv-agglomerative,
  title     = {{Agglomerative Token Clustering}},
  author    = {Haurum, Joakim Bruslund and Escalera, Sergio and Taylor, Graham W. and Moeslund, Thomas B.},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72998-0_12},
  url       = {https://mlanthology.org/eccv/2024/haurum2024eccv-agglomerative/}
}