GECO: GPT-Driven Estimation of 3D Human-Scene Contact in the Wild
Abstract
Understanding human-scene contact remains a challenging task, as it requires detectors to simultaneously model the contacting body parts, their proximity to scene objects, and the overall scene context. In this work, we introduce GECO, a framework employing Large Language Models (LLMs) with the key insight that language offers a powerful prior to intuitively reason about 3D human-object and human-scene contact based on extensive multimodal world knowledge. By converting a body-vertex formulation to natural language descriptors, we enable zero-shot generation of vertex-level contact directly on the SMPL body. We show that GPT offers a surprisingly competitive baseline close to state-of-the-art detectors on the DAMON dataset. We apply and evaluate different emerging prompting paradigms, highlighting their potential and limitations towards LLM-based human-scene contact estimation.
Cite
Text
Lee et al. "GECO: GPT-Driven Estimation of 3D Human-Scene Contact in the Wild." European Conference on Computer Vision Workshops, 2024. doi:10.1007/978-3-031-92591-7_29Markdown
[Lee et al. "GECO: GPT-Driven Estimation of 3D Human-Scene Contact in the Wild." European Conference on Computer Vision Workshops, 2024.](https://mlanthology.org/eccvw/2024/lee2024eccvw-geco/) doi:10.1007/978-3-031-92591-7_29BibTeX
@inproceedings{lee2024eccvw-geco,
title = {{GECO: GPT-Driven Estimation of 3D Human-Scene Contact in the Wild}},
author = {Lee, Chaehong and Singh, Simranjit and Fore, Michael and Pavlakos, Georgios and Stamoulis, Dimitrios},
booktitle = {European Conference on Computer Vision Workshops},
year = {2024},
pages = {436-450},
doi = {10.1007/978-3-031-92591-7_29},
url = {https://mlanthology.org/eccvw/2024/lee2024eccvw-geco/}
}