SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos

Abstract

Video-based visual relation detection tasks such as video scene graph generation play important roles in fine-grained video understanding. However current video visual relation detection datasets have two main limitations that hinder the progress of research in this area. First they do not explore complex human-human interactions in multi-person scenarios. Second the relation types of existing datasets have relatively low-level semantics and can be often recognized by appearance or simple prior information without the need for detailed spatio-temporal context reasoning. Nevertheless comprehending high-level interactions between humans is crucial for understanding complex multi-person videos such as sports and surveillance videos. To address this issue we propose a new video visual relation detection task: video human-human interaction detection and build a dataset named SportsHHI for it. SportsHHI contains 34 high-level interaction classes from basketball and volleyball sports. 118075 human bounding boxes and 50649 interaction instances are annotated on 11398 keyframes. To benchmark this we propose a two-stage baseline method and conduct extensive experiments to reveal the key factors for a successful human-human interaction detector. We hope that SportsHHI can stimulate research on human interaction understanding in videos and promote the development of spatio-temporal context modeling techniques in video visual relation detection.

Cite

Text

Wu et al. "SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01754

Markdown

[Wu et al. "SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/wu2024cvpr-sportshhi/) doi:10.1109/CVPR52733.2024.01754

BibTeX

@inproceedings{wu2024cvpr-sportshhi,
  title     = {{SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos}},
  author    = {Wu, Tao and He, Runyu and Wu, Gangshan and Wang, Limin},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {18537-18546},
  doi       = {10.1109/CVPR52733.2024.01754},
  url       = {https://mlanthology.org/cvpr/2024/wu2024cvpr-sportshhi/}
}