What Could Go Wrong? Discovering and Describing Failure Modes in Computer Vision

Abstract

In this work, we propose a simple yet effective solution to predict and describe via natural language potential failure modes of computer vision models. Given a pretrained model and a set of samples, our aim is to find sentences that accurately describe the visual conditions in which the model under-performs. In order to study this important topic and foster future research on it, we formalize the problem of Language-Based Error Explainability (LBEE) and propose a set of metrics to evaluate and compare different methods for this task. We propose solutions that operate in a joint vision-and-language embedding space, and can characterize through language descriptions model failures caused, e.g. , by objects unseen during training or adverse visual conditions.

Cite

Text

Csurka et al. "What Could Go Wrong? Discovering and Describing Failure Modes in Computer Vision." European Conference on Computer Vision Workshops, 2024. doi:10.1007/978-3-031-92648-8_12

Markdown

[Csurka et al. "What Could Go Wrong? Discovering and Describing Failure Modes in Computer Vision." European Conference on Computer Vision Workshops, 2024.](https://mlanthology.org/eccvw/2024/csurka2024eccvw-go/) doi:10.1007/978-3-031-92648-8_12

BibTeX

@inproceedings{csurka2024eccvw-go,
  title     = {{What Could Go Wrong? Discovering and Describing Failure Modes in Computer Vision}},
  author    = {Csurka, Gabriela and Hayes, Tyler L. and Larlus, Diane and Volpi, Riccardo},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2024},
  pages     = {183-199},
  doi       = {10.1007/978-3-031-92648-8_12},
  url       = {https://mlanthology.org/eccvw/2024/csurka2024eccvw-go/}
}