Pose Modulated Avatars from Video
Abstract
It is now possible to reconstruct dynamic human motion and shape from a sparse set of cameras using Neural Radiance Fields (NeRF) driven by an underlying skeleton. However, a challenge remains to model the deformation of cloth and skin in relation to skeleton pose. Unlike existing avatar models that are learned implicitly or rely on a proxy surface, our approach is motivated by the observation that different poses necessitate unique frequency assignments. Neglecting this distinction yields noisy artifacts in smooth areas or blurs fine-grained texture and shape details in sharp regions. We develop a two-branch neural network that is adaptive and explicit in the frequency domain. The first branch is a graph neural network that models correlations among body parts locally, taking skeleton pose as input. The second branch combines these correlation features to a set of global frequencies and then modulates the feature encoding. Our experiments demonstrate that our network outperforms state-of-the-art methods in terms of preserving details and generalization capabilities. Our code is available at https://github.com/ChunjinSong/PM-Avatars.
Cite
Text
Song et al. "Pose Modulated Avatars from Video." International Conference on Learning Representations, 2024.Markdown
[Song et al. "Pose Modulated Avatars from Video." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/song2024iclr-pose/)BibTeX
@inproceedings{song2024iclr-pose,
title = {{Pose Modulated Avatars from Video}},
author = {Song, Chunjin and Wandt, Bastian and Rhodin, Helge},
booktitle = {International Conference on Learning Representations},
year = {2024},
url = {https://mlanthology.org/iclr/2024/song2024iclr-pose/}
}