Incorporating Visual Grounding in GCN for Zero-Shot Learning of Human Object Interaction Actions

Abstract

GCN-based zero-shot learning approaches commonly use fixed input graphs representing external knowledge that usually comes from language. However, such input graphs fail to incorporate the visual domain nuances. We introduce a method to ground the external knowledge graph visually. The method is demonstrated on a novel concept of grouping actions according to a shared notion and shown to be of superior performance in zero-shot action recognition on two challenging human manipulation action datasets, the EPIC Kitchens dataset, and the Charades dataset. We further show that visually grounding the knowledge graph enhances the performance of GCNs when an adversarial attack corrupts the input graph.

Cite

Text

Devaraj et al. "Incorporating Visual Grounding in GCN for Zero-Shot Learning of Human Object Interaction Actions." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023. doi:10.1109/CVPRW59228.2023.00529

Markdown

[Devaraj et al. "Incorporating Visual Grounding in GCN for Zero-Shot Learning of Human Object Interaction Actions." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023.](https://mlanthology.org/cvprw/2023/devaraj2023cvprw-incorporating/) doi:10.1109/CVPRW59228.2023.00529

BibTeX

@inproceedings{devaraj2023cvprw-incorporating,
  title     = {{Incorporating Visual Grounding in GCN for Zero-Shot Learning of Human Object Interaction Actions}},
  author    = {Devaraj, Chinmaya and Fermüller, Cornelia and Aloimonos, Yiannis},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2023},
  pages     = {5008-5017},
  doi       = {10.1109/CVPRW59228.2023.00529},
  url       = {https://mlanthology.org/cvprw/2023/devaraj2023cvprw-incorporating/}
}