Human-in-the-Loop Local Corrections of 3D Scene Layouts via Infilling
Abstract
We present a novel human-in-the-loop approach to estimate 3D scene layout that uses human feedback from an egocentric standpoint. We study this approach through introduction of a novel local correction task, where users identify local errors and prompt a model to automatically correct them. Building on SceneScript, a state-of-the-art framework for 3D scene layout estimation that leverages structured language, we propose a solution that structures this problem as "infilling", a task studied in natural language processing. We train a multi-task version of SceneScript that maintains performance on global predictions while significantly improving its local correction ability. We integrate this into a human-in-the-loop system, enabling a user to iteratively refine scene layout estimates via a low-friction "one-click fix" workflow. Our system enables the final refined layout to diverge from the training distribution, allowing for more accurate modelling of complex layouts.
Cite
Text
Xie et al. "Human-in-the-Loop Local Corrections of 3D Scene Layouts via Infilling." International Conference on Computer Vision, 2025.Markdown
[Xie et al. "Human-in-the-Loop Local Corrections of 3D Scene Layouts via Infilling." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/xie2025iccv-humanintheloop/)BibTeX
@inproceedings{xie2025iccv-humanintheloop,
title = {{Human-in-the-Loop Local Corrections of 3D Scene Layouts via Infilling}},
author = {Xie, Christopher and Avetisyan, Armen and Howard-Jenkins, Henry and Siddiqui, Yawar and Straub, Julian and Newcombe, Richard and Balntas, Vasileios and Engel, Jakob},
booktitle = {International Conference on Computer Vision},
year = {2025},
pages = {5657-5666},
url = {https://mlanthology.org/iccv/2025/xie2025iccv-humanintheloop/}
}