Fine-Tuning Human Pose Estimations in Videos

Abstract

We propose a semi-supervised self-training method for fine-tuning human pose estimations in videos that provides accurate estimations even for complex sequences. We surpass state-of-the-art on most of the datasets used and also show a 2.33% gain over the baseline on our new dataset of unrestricted sports videos. The self-training model presented has two components: a static Pictorial Structure (PS) based model and a dynamic ensemble of exemplars. We present a pose quality criteria that is primarily used for batch selection and automatic parameter selection. The same criteria works as a low-level pose evaluator used in post-processing. We set a new challenge by introducing a full human body-parts annotated complex dataset, CVIT-SPORTS, which contains complex videos from the sports domain. The strength of our method is demonstrated by adapting to videos of complex activities such as cricket-bowling, cricket-batting, football as well as available standard datasets.

Cite

Text

Singh et al. "Fine-Tuning Human Pose Estimations in Videos." IEEE/CVF Winter Conference on Applications of Computer Vision, 2016. doi:10.1109/WACV.2016.7477680

Markdown

[Singh et al. "Fine-Tuning Human Pose Estimations in Videos." IEEE/CVF Winter Conference on Applications of Computer Vision, 2016.](https://mlanthology.org/wacv/2016/singh2016wacv-fine/) doi:10.1109/WACV.2016.7477680

BibTeX

@inproceedings{singh2016wacv-fine,
  title     = {{Fine-Tuning Human Pose Estimations in Videos}},
  author    = {Singh, Digvijay and Balasubramanian, Vineeth and Jawahar, C. V.},
  booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision},
  year      = {2016},
  pages     = {1-9},
  doi       = {10.1109/WACV.2016.7477680},
  url       = {https://mlanthology.org/wacv/2016/singh2016wacv-fine/}
}