Pose Modulated Avatars from Video

Abstract

It is now possible to reconstruct dynamic human motion and shape from a sparse set of cameras using Neural Radiance Fields (NeRF) driven by an underlying skeleton. However, a challenge remains to model the deformation of cloth and skin in relation to skeleton pose. Unlike existing avatar models that are learned implicitly or rely on a proxy surface, our approach is motivated by the observation that different poses necessitate unique frequency assignments. Neglecting this distinction yields noisy artifacts in smooth areas or blurs fine-grained texture and shape details in sharp regions. We develop a two-branch neural network that is adaptive and explicit in the frequency domain. The first branch is a graph neural network that models correlations among body parts locally, taking skeleton pose as input. The second branch combines these correlation features to a set of global frequencies and then modulates the feature encoding. Our experiments demonstrate that our network outperforms state-of-the-art methods in terms of preserving details and generalization capabilities. Our code is available at https://github.com/ChunjinSong/PM-Avatars.

Cite

Text

Song et al. "Pose Modulated Avatars from Video." International Conference on Learning Representations, 2024.

Markdown

[Song et al. "Pose Modulated Avatars from Video." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/song2024iclr-pose/)

BibTeX

@inproceedings{song2024iclr-pose,
  title     = {{Pose Modulated Avatars from Video}},
  author    = {Song, Chunjin and Wandt, Bastian and Rhodin, Helge},
  booktitle = {International Conference on Learning Representations},
  year      = {2024},
  url       = {https://mlanthology.org/iclr/2024/song2024iclr-pose/}
}