DeepFuse: An IMU-Aware Network for Real-Time 3D Human Pose Estimation from Multi-View Image
Abstract
In this paper, we propose a two-stage fully 3D network, namely DeepFuse, to estimate human pose in 3D space by fusing body-worn Inertial Measurement Unit (IMU) data and multi-view images deeply. The first stage is designed for pure vision estimation. To preserve data primitiveness of multi-view inputs, the vision stage uses multi-channel volume as data representation and 3D soft-argmax as activation layer. The second one is the IMU refinement stage which introduces an IMU-bone layer to fuse the IMU and vision data earlier at data level. without requiring a given skeleton model a priori, we can achieve a mean joint error of 28.9mm on TotalCapture dataset and 13.4mm on Human3.6M dataset under protocol 1, improving the SOTA result by a large margin. Finally, we discuss the effectiveness of a fully 3D network for 3D pose estimation experimentally which may benefit future research.
Cite
Text
Huang et al. "DeepFuse: An IMU-Aware Network for Real-Time 3D Human Pose Estimation from Multi-View Image." Winter Conference on Applications of Computer Vision, 2020.Markdown
[Huang et al. "DeepFuse: An IMU-Aware Network for Real-Time 3D Human Pose Estimation from Multi-View Image." Winter Conference on Applications of Computer Vision, 2020.](https://mlanthology.org/wacv/2020/huang2020wacv-deepfuse/)BibTeX
@inproceedings{huang2020wacv-deepfuse,
title = {{DeepFuse: An IMU-Aware Network for Real-Time 3D Human Pose Estimation from Multi-View Image}},
author = {Huang, Fuyang and Zeng, Ailing and Liu, Minhao and Lai, Qiuxia and Xu, Qiang},
booktitle = {Winter Conference on Applications of Computer Vision},
year = {2020},
url = {https://mlanthology.org/wacv/2020/huang2020wacv-deepfuse/}
}