Self-Supervised Video Interaction Classification Using Image Representation of Skeleton Data

Farzaneh Askari, Ruixi Jiang, Zhiwei Li, Jiatong Niu, Yuyan Shi, James J. Clark

CVPRW 2023 pp. 5229-5238

doi:10.1109/CVPRW59228.2023.00551 /cvprw/2023/askari2023cvprw-selfsupervised/

Abstract

Recognizing interactions from sports games broadcast videos is an application of Interaction Recognition from Videos (IRV), that offers many challenges due to complex interactions that are often recorded from a suboptimal view point. Annotating large scale sports specific datasets is expensive and time-consuming. Therefore, in this study, we propose to demonstrate the effectiveness of applying Self-Supervised Learning (SSL) methods for building useful representations from human skeleton pose data (pose for short) without requiring costly annotations for a large scale dataset. Given the numerous well established image-based SSL methods, we demonstrate how to adapt them for sequences of pose through data transformation and a series of pose-based augmentations. We specifically adapt the Relational Reasoning SSL (Relational-SSL for short) [27] and achieve 68.18 ± 0% and 76.62 ± 2.7% in linear evaluation and finetuning protocols, respectively, for the downstream task of IRV from sports broadcast videos. Lastly, we run ablation studies on different components of the method, including the effect of using estimated pose (versus ground truth) on the performance of the downstream task.1

CVPRW Semantic Scholar

Cite

Text

Askari et al. "Self-Supervised Video Interaction Classification Using Image Representation of Skeleton Data." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023. doi:10.1109/CVPRW59228.2023.00551

Markdown

[Askari et al. "Self-Supervised Video Interaction Classification Using Image Representation of Skeleton Data." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023.](https://mlanthology.org/cvprw/2023/askari2023cvprw-selfsupervised/) doi:10.1109/CVPRW59228.2023.00551

BibTeX

@inproceedings{askari2023cvprw-selfsupervised,
  title     = {{Self-Supervised Video Interaction Classification Using Image Representation of Skeleton Data}},
  author    = {Askari, Farzaneh and Jiang, Ruixi and Li, Zhiwei and Niu, Jiatong and Shi, Yuyan and Clark, James J.},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2023},
  pages     = {5229-5238},
  doi       = {10.1109/CVPRW59228.2023.00551},
  url       = {https://mlanthology.org/cvprw/2023/askari2023cvprw-selfsupervised/}
}