SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation
Abstract
Recently transformer-based methods have achieved state-of-the-art prediction quality on human pose estimation(HPE). Nonetheless most of these top-performing transformer-based models are too computation-consuming and storage-demanding to deploy on edge computing platforms. Those transformer-based models that require fewer resources are prone to under-fitting due to their smaller scale and thus perform notably worse than their larger counterparts. Given this conundrum we introduce SDPose a new self-distillation method for improving the performance of small transformer-based models. To mitigate the problem of under-fitting we design a transformer module named Multi-Cycled Transformer(MCT) based on multiple-cycled forwards to more fully exploit the potential of small model parameters. Further in order to prevent the additional inference compute-consuming brought by MCT we introduce a self-distillation scheme extracting the knowledge from the MCT module to a naive forward model. Specifically on the MSCOCO validation dataset SDPose-T obtains 69.7% mAP with 4.4M parameters and 1.8 GFLOPs. Furthermore SDPose-S-V2 obtains 73.5% mAP on the MSCOCO validation dataset with 6.2M parameters and 4.7 GFLOPs achieving a new state-of-the-art among predominant tiny neural network methods.
Cite
Text
Chen et al. "SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00109Markdown
[Chen et al. "SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/chen2024cvpr-sdpose/) doi:10.1109/CVPR52733.2024.00109BibTeX
@inproceedings{chen2024cvpr-sdpose,
title = {{SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation}},
author = {Chen, Sichen and Zhang, Yingyi and Huang, Siming and Yi, Ran and Fan, Ke and Zhang, Ruixin and Chen, Peixian and Wang, Jun and Ding, Shouhong and Ma, Lizhuang},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2024},
pages = {1082-1090},
doi = {10.1109/CVPR52733.2024.00109},
url = {https://mlanthology.org/cvpr/2024/chen2024cvpr-sdpose/}
}