Attentive Spatio-Temporal Representation Learning for Diving Classification

Abstract

Competitive diving is a well recognized aquatic sport in which a person dives from a platform or a springboard into the water. Based on the acrobatics performed during the dive, diving is classified into a finite set of action classes which are standardized by FINA. In this work, we propose an attention guided LSTM-based neural network architecture for the task of diving classification. The network takes the frames of a diving video as input and determines its class. We evaluate the performance of the proposed model on a recently introduced competitive diving dataset, Diving48. It contains over 18000 video clips which covers 48 classes of diving. The proposed model outperforms the classification accuracy of the state-of-the-art models in both 2D and 3D frameworks by 11.54% and 4.24%, respectively. We show that the network is able to localize the diver in the video frames during the dive without being trained with such a supervision.

Cite

Text

Kanojia et al. "Attentive Spatio-Temporal Representation Learning for Diving Classification." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019. doi:10.1109/CVPRW.2019.00302

Markdown

[Kanojia et al. "Attentive Spatio-Temporal Representation Learning for Diving Classification." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019.](https://mlanthology.org/cvprw/2019/kanojia2019cvprw-attentive/) doi:10.1109/CVPRW.2019.00302

BibTeX

@inproceedings{kanojia2019cvprw-attentive,
  title     = {{Attentive Spatio-Temporal Representation Learning for Diving Classification}},
  author    = {Kanojia, Gagan and Kumawat, Sudhakar and Raman, Shanmuganathan},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2019},
  pages     = {2467-2476},
  doi       = {10.1109/CVPRW.2019.00302},
  url       = {https://mlanthology.org/cvprw/2019/kanojia2019cvprw-attentive/}
}