Rugby Scene Classification Enhanced by Vision Language Model

Abstract

This study investigates the integration of vision language models (VLM) to enhance the classification of situations within rugby match broadcasts. The importance of accurately identifying situations in sports videos is emphasized for understanding game dynamics and facilitating downstream tasks like performance evaluation and injury prevention. Utilizing a dataset comprising 18, 000 labeled images extracted at 0.2-second intervals from 100 minutes of rugby match broadcasts, scene classification tasks including contact plays (scrums, mauls, rucks, tackles, lineouts), rucks, tackles, lineouts, and multiclass classification were performed. The study aims to validate the utility of VLM outputs in improving classification performance compared to using solely image data. Experimental results demonstrate substantial performance improvements across all tasks with the incorporation of VLM outputs. Our analysis of prompts suggests that, when provided with appropriate contextual information through natural language, VLMs can effectively capture the context of a given image. The findings of our study indicate that leveraging VLMs in the domain of sports analysis holds promise for developing image processing models capable of incorpolating the tacit knowledge encoded within language models, as well as information conveyed through natural language descriptions.

Cite

Text

Nonaka et al. "Rugby Scene Classification Enhanced by Vision Language Model." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00331

Markdown

[Nonaka et al. "Rugby Scene Classification Enhanced by Vision Language Model." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/nonaka2024cvprw-rugby/) doi:10.1109/CVPRW63382.2024.00331

BibTeX

@inproceedings{nonaka2024cvprw-rugby,
  title     = {{Rugby Scene Classification Enhanced by Vision Language Model}},
  author    = {Nonaka, Naoki and Fujihira, Ryo and Koshiba, Toshiki and Maeda, Akira and Seita, Jun},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2024},
  pages     = {3256-3266},
  doi       = {10.1109/CVPRW63382.2024.00331},
  url       = {https://mlanthology.org/cvprw/2024/nonaka2024cvprw-rugby/}
}