AugData Distillation for Monocular 3D Human Pose Estimation

Kim, Jiman

doi:10.1109/CVPRW63382.2024.00763

AugData Distillation for Monocular 3D Human Pose Estimation

Jiman Kim

CVPRW 2024 pp. 7672-7681

doi:10.1109/CVPRW63382.2024.00763 /cvprw/2024/kim2024cvprw-augdata/

Abstract

A large amount of data is necessary to lift the 2D human pose to the correct 3D pose, but the available public data is very limited. In particular, since monocular-based algorithms use only limited visual information acquired from one viewpoint, the amount of data is much smaller than that of multi-view. To overcome this problem, 2D-3D pair augmentation methods have been proposed, but they mainly focus on increasing the amount. However, recent research shows that quality rather than quantity significantly impacts performance improvement. This paper proposes AugData Distillation (ADD), which can dramatically reduce the 3D human pose estimation errors with only a small amount of augmentation by simultaneously considering the quality and quantity of training data. Quality distillation selects core data that significantly contributes to performance improvement among all augmented data. The total amount of augmentation is adjusted through scale distillation. These processes remove meaningless data and enable the 3D pose estimator to train core information. We selected TAG-Net [15] as the baseline model to verify the performance improvement in the data-centric method. Although it is not the top rank in all 3D HPEs, the algorithm achieved the highest accuracy in the monocular data-centric method. Experimental results show that our approach reduced a baseline method’s 3D human pose estimation error by 22% with only 1.6 times augmentation. This means that most of the baseline model’s augmented data used for training adversely affects performance improvement. A much lower estimation error can be expected if the ADD is combined with various latest network architectures.

CVPRW Semantic Scholar

Cite

Text

Kim. "AugData Distillation for Monocular 3D Human Pose Estimation." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00763

Markdown

[Kim. "AugData Distillation for Monocular 3D Human Pose Estimation." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/kim2024cvprw-augdata/) doi:10.1109/CVPRW63382.2024.00763

BibTeX

@inproceedings{kim2024cvprw-augdata,
  title     = {{AugData Distillation for Monocular 3D Human Pose Estimation}},
  author    = {Kim, Jiman},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2024},
  pages     = {7672-7681},
  doi       = {10.1109/CVPRW63382.2024.00763},
  url       = {https://mlanthology.org/cvprw/2024/kim2024cvprw-augdata/}
}