CHAMP: Conformalized 3D Human Multi-Hypothesis Pose Estimators

Abstract

We introduce CHAMP, a novel method for learning sequence-to-sequence, multi-hypothesis 3D human poses from 2D keypoints by leveraging a conditional distribution with a diffusion model. To predict a single output 3D pose sequence, we generate and aggregate multiple 3D pose hypotheses. For better aggregation results, we develop a method to score these hypotheses during training, effectively integrating conformal prediction into the learning process. This process results in a differentiable conformal predictor that is trained end-to-end with the 3D pose estimator. Post-training, the learned scoring model is used as the conformity score, and the 3D pose estimator is combined with a conformal predictor to select the most accurate hypotheses for downstream aggregation. Our results indicate that using a simple mean aggregation on the conformal prediction-filtered hypotheses set yields competitive results. When integrated with more sophisticated aggregation techniques, our method achieves state-of-the-art performance across various metrics and datasets while inheriting the probabilistic guarantees of conformal prediction.

Cite

Text

Zhang and Carlone. "CHAMP: Conformalized 3D Human Multi-Hypothesis Pose Estimators." International Conference on Learning Representations, 2025.

Markdown

[Zhang and Carlone. "CHAMP: Conformalized 3D Human Multi-Hypothesis Pose Estimators." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/zhang2025iclr-champ/)

BibTeX

@inproceedings{zhang2025iclr-champ,
  title     = {{CHAMP: Conformalized 3D Human Multi-Hypothesis Pose Estimators}},
  author    = {Zhang, Harry and Carlone, Luca},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/zhang2025iclr-champ/}
}