Hierarchical Explanations for Video Action Recognition

Abstract

To interpret deep neural networks, one main approach is to dissect the visual input and find the prototypical parts responsible for the classification. However, existing methods often ignore the hierarchical relationship between these prototypes, and thus can not explain semantic concepts at both higher level (e.g., water sports) and lower level (e.g., swimming). In this paper inspired by human cognition system, we leverage hierarchal information to deal with uncertainty. To this end, we propose HIerarchical Prototype Explainer (HIPE) to build hierarchical relations between prototypes and classes. The faithfulness of our method is verified by reducing accuracy-explainability trade-off on UCF-101 while providing multi-level explanations.

Cite

Text

Gulshad et al. "Hierarchical Explanations for Video Action Recognition." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023. doi:10.1109/CVPRW59228.2023.00379

Markdown

[Gulshad et al. "Hierarchical Explanations for Video Action Recognition." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023.](https://mlanthology.org/cvprw/2023/gulshad2023cvprw-hierarchical/) doi:10.1109/CVPRW59228.2023.00379

BibTeX

@inproceedings{gulshad2023cvprw-hierarchical,
  title     = {{Hierarchical Explanations for Video Action Recognition}},
  author    = {Gulshad, Sadaf and Long, Teng and van Noord, Nanne},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2023},
  pages     = {3703-3708},
  doi       = {10.1109/CVPRW59228.2023.00379},
  url       = {https://mlanthology.org/cvprw/2023/gulshad2023cvprw-hierarchical/}
}