ProtoS-ViT: Visual Foundation Models for Sparse Self-Explainable Classifications
Abstract
Prototypical networks aim to build intrinsically explainable models based on the linear summation of concepts. Concepts are coherent entities that we, as humans, can recognize and associate with a certain object or entity. However, important challenges remain in the fair evaluation of explanation quality provided by these models. This work first proposes an extensive set of quantitative and qualitative metrics which allow to identify drawbacks in current prototypical networks. It then introduces a novel architecture which provides compact explanations, outperforming current prototypical models in terms of explanation quality. Overall, the proposed architecture demonstrates how frozen pre-trained ViT backbones can be effectively turned into prototypical models for both general and domainspecific tasks, in our case biomedical image classifiers. Code is available at https://github.com/hturbe/protosvit.
Cite
Text
Turbe et al. "ProtoS-ViT: Visual Foundation Models for Sparse Self-Explainable Classifications." NeurIPS 2024 Workshops: InterpretableAI, 2024.Markdown
[Turbe et al. "ProtoS-ViT: Visual Foundation Models for Sparse Self-Explainable Classifications." NeurIPS 2024 Workshops: InterpretableAI, 2024.](https://mlanthology.org/neuripsw/2024/turbe2024neuripsw-protosvit/)BibTeX
@inproceedings{turbe2024neuripsw-protosvit,
title = {{ProtoS-ViT: Visual Foundation Models for Sparse Self-Explainable Classifications}},
author = {Turbe, Hugues and Bjelogrlic, Mina and Mengaldo, Gianmarco and Lovis, Christian},
booktitle = {NeurIPS 2024 Workshops: InterpretableAI},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/turbe2024neuripsw-protosvit/}
}