Disentangling Planning, Driving and Rendering for Photorealistic Avatar Agents

Abstract

In this study, our goal is to create interactive avatar agents that can autonomously animate nuanced facial movements realistically, from both visual and behavioral perspectives. Given high-level inputs about the environment and agent profile, our framework harnesses LLMs to produce a series of detailed text descriptions of the avatar agents’ facial motions. These descriptions are then processed by our task-agnostic driving engine into motion token sequences, which are subsequently converted into continuous motion embeddings that are further consumed by our standalone neural-based renderer to generate the final photorealistic avatar animations. To our knowledge, we are the first to utilize the planning and reasoning ability of LLMs together with neural rendering for generalized non-verbal prediction and photo-realistic rendering of avatar agents.

Cite

Text

Wang et al. "Disentangling Planning, Driving and Rendering for Photorealistic Avatar Agents." European Conference on Computer Vision Workshops, 2024. doi:10.1007/978-3-031-91578-9_8

Markdown

[Wang et al. "Disentangling Planning, Driving and Rendering for Photorealistic Avatar Agents." European Conference on Computer Vision Workshops, 2024.](https://mlanthology.org/eccvw/2024/wang2024eccvw-disentangling/) doi:10.1007/978-3-031-91578-9_8

BibTeX

@inproceedings{wang2024eccvw-disentangling,
  title     = {{Disentangling Planning, Driving and Rendering for Photorealistic Avatar Agents}},
  author    = {Wang, Duomin and Dai, Bin and Deng, Yu and Wang, Baoyuan},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2024},
  pages     = {137-147},
  doi       = {10.1007/978-3-031-91578-9_8},
  url       = {https://mlanthology.org/eccvw/2024/wang2024eccvw-disentangling/}
}