From Sparse to Dense: Spatio-Temporal Fusion for Multi-View 3D Human Pose Estimation with DenseWarper

Li, Ling; Chen, Changjie; Wang, Yuyan; Lyu, Jiaqing; Chang, Kenglun; Chen, Yiyun; Deng, Zhidong

From Sparse to Dense: Spatio-Temporal Fusion for Multi-View 3D Human Pose Estimation with DenseWarper

Ling Li, Changjie Chen, Yuyan Wang, Jiaqing Lyu, Kenglun Chang, Yiyun Chen, Zhidong Deng

ICLR 2026

/iclr/2026/li2026iclr-sparse/

Abstract

In multi-view 3D human pose estimation, models typically rely on images captured simultaneously from different camera views to predict a pose at a specific moment. While providing accurate spatial information, this traditional approach often overlooks the rich temporal dependencies between adjacent frames. We propose a novel 3D human pose estimation input method: the sparse interleaved input to address this. This method leverages images captured from different camera views at various time points (e.g., View 1 at time $t$ and View 2 at time $t+\delta$), allowing our model to capture rich spatio-temporal information and effectively boost performance. More importantly, this approach offers two key advantages: First, it can theoretically increase the output pose frame rate by N times with N cameras, thereby breaking through single-view frame rate limitations and enhancing the temporal resolution of the production. Second, using a sparse subset of available frames, our method can reduce data redundancy and simultaneously achieve better performance. We introduce the DenseWarper model, which leverages epipolar geometry for efficient spatio-temporal heatmap exchange. We conducted extensive experiments on the Human3.6M and MPI-INF-3DHP datasets. Results demonstrate that our method, utilizing only sparse interleaved images as input, outperforms traditional dense multi-view input approaches and achieves state-of-the-art performance.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Li et al. "From Sparse to Dense: Spatio-Temporal Fusion for Multi-View 3D Human Pose Estimation with DenseWarper." International Conference on Learning Representations, 2026.

Markdown

[Li et al. "From Sparse to Dense: Spatio-Temporal Fusion for Multi-View 3D Human Pose Estimation with DenseWarper." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/li2026iclr-sparse/)

BibTeX

@inproceedings{li2026iclr-sparse,
  title     = {{From Sparse to Dense: Spatio-Temporal Fusion for Multi-View 3D Human Pose Estimation with DenseWarper}},
  author    = {Li, Ling and Chen, Changjie and Wang, Yuyan and Lyu, Jiaqing and Chang, Kenglun and Chen, Yiyun and Deng, Zhidong},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/li2026iclr-sparse/}
}