Epipolar Transformer for Multi-View Human Pose Estimation
Abstract
A common way to localize 3D human joints in a synchronized and calibrated multi-view setup is a two-step process: (1) apply a 2D detector separately on each view to localize joints in 2D, (2) robust triangulation on 2D detections from each view to acquire the 3D joint locations. However, in step 1, the 2D detector is constrained to solve challenging cases which could be better resolved in 3D, such as occlusions and oblique viewing angles, purely in 2D without leveraging any 3D information. Therefore, we propose the differentiable "epipolar transformer", which empowers the 2D detector to leverage 3D-aware features to improve 2D pose estimation. The intuition is: given a 2D location p in the reference view, we would like to first find its corresponding point p' in the source view, then combine the features at p' with the features at p, thus leading to a more 3D-aware feature at p. Inspired by stereo matching, the epipolar transformer leverages epipolar constraints and feature matching to approximate the features at p′. The key advantages of the epipolar transformer are: (1) it has minimal learnable parameters, (2) itcanbeeasilypluggedintoexistingnetworks,moreover (3) it is interpretable, i.e., we can analyze the location p' to understand whether matching over the epipolar line was successful. Experiments on Human3.6M [9] show that our approach has consistent improvements over the baselines. Specifically, in the condition where no external data is used, our Human3.6M model trained with ResNet-50 and image size 256 x 256 outperforms state-of-the-art by a large margin and achieves MPJPE 26.9 mm. Code is available1. This is the workshop version ofour CVPR 2020 paper [8]
Cite
Text
He et al. "Epipolar Transformer for Multi-View Human Pose Estimation." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020. doi:10.1109/CVPRW50498.2020.00526Markdown
[He et al. "Epipolar Transformer for Multi-View Human Pose Estimation." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020.](https://mlanthology.org/cvprw/2020/he2020cvprw-epipolar/) doi:10.1109/CVPRW50498.2020.00526BibTeX
@inproceedings{he2020cvprw-epipolar,
title = {{Epipolar Transformer for Multi-View Human Pose Estimation}},
author = {He, Yihui and Yan, Rui and Fragkiadaki, Katerina and Yu, Shoou-I},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2020},
pages = {4466-4471},
doi = {10.1109/CVPRW50498.2020.00526},
url = {https://mlanthology.org/cvprw/2020/he2020cvprw-epipolar/}
}