Unlocking Slot Attention by Changing Optimal Transport Costs

Abstract

Slot attention is a powerful method for object-centric modeling in images and videos. However, its set-equivariance limits its ability to handle videos with a dynamic number of objects because it cannot break ties. To overcome this limitation, we first establish a connection between slot attention and optimal transport. Based on this new perspective we propose MESH (Minimize Entropy of Sinkhorn): a cross-attention module that combines the tiebreaking properties of unregularized optimal transport with the speed of regularized optimal transport. We evaluate slot attention using MESH on multiple object-centric learning benchmarks and find significant improvements over slot attention in every setting.

Cite

Text

Zhang et al. "Unlocking Slot Attention by Changing Optimal Transport Costs." International Conference on Machine Learning, 2023.

Markdown

[Zhang et al. "Unlocking Slot Attention by Changing Optimal Transport Costs." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/zhang2023icml-unlocking/)

BibTeX

@inproceedings{zhang2023icml-unlocking,
  title     = {{Unlocking Slot Attention by Changing Optimal Transport Costs}},
  author    = {Zhang, Yan and Zhang, David W. and Lacoste-Julien, Simon and Burghouts, Gertjan J. and Snoek, Cees G. M.},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {41931-41951},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/zhang2023icml-unlocking/}
}