ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments
Abstract
Simulated virtual environments have been widely used to learn robotic agents that perform daily household tasks. These environments encourage research progress by far, but often provide limited object interactability, visual appearance different from real-world environments, or relatively smaller environment sizes. This prevents the learned models in the virtual scenes from being readily deployable. To bridge the gap between these learning environments and deploying (, real) environments, we propose the benchmark that employs real-world scenes, objects, and room layouts to learn agents to complete household tasks by understanding free-form language instructions and interacting with objects in large, multi-room and 3D-captured scenes. Specifically, we extend the ALFRED benchmark with updates for larger environmental spaces with smaller visual domain gaps. With , we analyze previously crafted methods for the ALFRED benchmark and observe that they consistently yield lower performance in all metrics, encouraging the community to develop methods in more realistic environments. Our code and data are publicly available1 . 1 Homepage: https://github.com/snumprlab/realfred
Cite
Text
Kim et al. "ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72684-2_20Markdown
[Kim et al. "ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/kim2024eccv-realfred/) doi:10.1007/978-3-031-72684-2_20BibTeX
@inproceedings{kim2024eccv-realfred,
title = {{ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments}},
author = {Kim, Taewoong and Min, Cheolhong and Kim, Byeonghwi and Kim, Jinyeon and Jeung, Wonje and Choi, Jonghyun},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
doi = {10.1007/978-3-031-72684-2_20},
url = {https://mlanthology.org/eccv/2024/kim2024eccv-realfred/}
}