Multi-Level Cross-Modal Alignment for Image Clustering

Qiu, Liping; Zhang, Qin; Chen, Xiaojun; Cai, Shaotian

doi:10.1609/AAAI.V38I13.29387

Multi-Level Cross-Modal Alignment for Image Clustering

Liping Qiu, Qin Zhang, Xiaojun Chen, Shaotian Cai

AAAI 2024 pp. 14695-14703

doi:10.1609/AAAI.V38I13.29387 /aaai/2024/qiu2024aaai-multi/

Abstract

Recently, the cross-modal pretraining model has been employed to produce meaningful pseudo-labels to supervise the training of an image clustering model. However, numerous erroneous alignments in a cross-modal pretraining model could produce poor-quality pseudo labels and degrade clustering performance. To solve the aforementioned issue, we propose a novel Multi-level Cross-modal Alignment method to improve the alignments in a cross-modal pretraining model for downstream tasks, by building a smaller but better semantic space and aligning the images and texts in three levels, i.e., instance-level, prototype-level, and semantic-level. Theoretical results show that our proposed method converges, and suggests effective means to reduce the expected clustering risk of our method. Experimental results on five benchmark datasets clearly show the superiority of our new method.

PDF AAAI Semantic Scholar

Cite

Text

Qiu et al. "Multi-Level Cross-Modal Alignment for Image Clustering." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I13.29387

Markdown

[Qiu et al. "Multi-Level Cross-Modal Alignment for Image Clustering." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/qiu2024aaai-multi/) doi:10.1609/AAAI.V38I13.29387

BibTeX

@inproceedings{qiu2024aaai-multi,
  title     = {{Multi-Level Cross-Modal Alignment for Image Clustering}},
  author    = {Qiu, Liping and Zhang, Qin and Chen, Xiaojun and Cai, Shaotian},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {14695-14703},
  doi       = {10.1609/AAAI.V38I13.29387},
  url       = {https://mlanthology.org/aaai/2024/qiu2024aaai-multi/}
}