How CNNs and ViTs Perceive Similarities Between Categories

Filus, Katarzyna; Domanska, Joanna

doi:10.1007/978-3-032-06078-5_2

How CNNs and ViTs Perceive Similarities Between Categories

Katarzyna Filus, Joanna Domanska

ECML-PKDD 2025 pp. 22-40

doi:10.1007/978-3-032-06078-5_2 /ecmlpkdd/2025/filus2025ecmlpkdd-cnns/

Abstract

Vision Transformers (ViTs) and Convolutional Neural Networks (CNNs) trained for supervised tasks are the leading networks used in practical computer vision. Despite using different techniques, they both perfect their object recognition skills. In this race, it is overall accuracy that matters at most. But is it enough? Should not we care about the correct perception of inter-class similarities? We believe we should, as similarity is a fundamental aspect of categorization and the structure of the world is highly correlated. Models should reasonably assess similarities for more nuanced perception, and we should examine it for more transparency and trust. That is why, we analyzed what state-of-the-art object recognition networks perceive as similar. We proposed a framework to visually and numerically examine and compare the perception of different trained models. We used it to answer a series of similarity-related questions based on experiments on a large population of 42 models.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Filus and Domanska. "How CNNs and ViTs Perceive Similarities Between Categories." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025. doi:10.1007/978-3-032-06078-5_2

Markdown

[Filus and Domanska. "How CNNs and ViTs Perceive Similarities Between Categories." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025.](https://mlanthology.org/ecmlpkdd/2025/filus2025ecmlpkdd-cnns/) doi:10.1007/978-3-032-06078-5_2

BibTeX

@inproceedings{filus2025ecmlpkdd-cnns,
  title     = {{How CNNs and ViTs Perceive Similarities Between Categories}},
  author    = {Filus, Katarzyna and Domanska, Joanna},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2025},
  pages     = {22-40},
  doi       = {10.1007/978-3-032-06078-5_2},
  url       = {https://mlanthology.org/ecmlpkdd/2025/filus2025ecmlpkdd-cnns/}
}