Multimodal Promptable Token Merging for Diffusion Models

Abstract

Token compression techniques, such as token merging and pruning, are essential for alleviating the substantial computational burden caused by the proliferation of tokens within attention mechanisms. However, current methods often rely on token-to-token distances or similarity metrics to evaluate token importance, which is inadequate in the context of modern promptable designs and frameworks that are gaining prominence. To address this limitation, we introduce a novel and effective merging strategy called “Multimodal Promptable Token Merging” (MPTM). The proposed method leverages a multimodal, prompt-centric methodology, assessing the proximity between tokens of each input modality and the multimodal prompt to efficiently eliminate redundant tokens while preserving those rich in information. Extensive experiments demonstrate that MPTM significantly reduces computational costs without compromising essential information in generative image tasks. When integrated into diffusion-based detection architectures, MPTM outperforms existing state-of-the-art methods by 2.3% in object detection tasks. Additionally, when applied to multimodal diffusion models, MPTM maintains high-quality output while achieving a 2.9-fold increase in throughput, highlighting its versatility.

Cite

Text

Hong and Liu. "Multimodal Promptable Token Merging for Diffusion Models." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I16.33894

Markdown

[Hong and Liu. "Multimodal Promptable Token Merging for Diffusion Models." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/hong2025aaai-multimodal/) doi:10.1609/AAAI.V39I16.33894

BibTeX

@inproceedings{hong2025aaai-multimodal,
  title     = {{Multimodal Promptable Token Merging for Diffusion Models}},
  author    = {Hong, Cheng-Yao and Liu, Tyng-Luh},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {17231-17239},
  doi       = {10.1609/AAAI.V39I16.33894},
  url       = {https://mlanthology.org/aaai/2025/hong2025aaai-multimodal/}
}