Housekeep: Tidying Virtual Households Using Commonsense Reasoning

Abstract

We introduce Housekeep, a benchmark to evaluate commonsense reasoning in the home for embodied AI. In Housekeep, an embodied agent must tidy a house by rearranging misplaced objects without explicit instructions specifying which objects need to be rearranged. Instead, the agent must learn from and is evaluated against human preferences of which objects belong where in a tidy house. Specifically, we collect a dataset of where humans typically place objects in tidy and untidy houses constituting 1799 objects, 268 object categories, 585 placements, and 105 rooms. Next, we propose a modular baseline approach for Housekeep that integrates planning, exploration, and navigation. It leverages a fine-tuned large language model (LLM) trained on an internet text corpus for effective planning. We show that our baseline generalizes to rearranging unseen objects in unknown environments.

Cite

Text

Kant et al. "Housekeep: Tidying Virtual Households Using Commonsense Reasoning." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-19842-7_21

Markdown

[Kant et al. "Housekeep: Tidying Virtual Households Using Commonsense Reasoning." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/kant2022eccv-housekeep/) doi:10.1007/978-3-031-19842-7_21

BibTeX

@inproceedings{kant2022eccv-housekeep,
  title     = {{Housekeep: Tidying Virtual Households Using Commonsense Reasoning}},
  author    = {Kant, Yash and Ramachandran, Arun and Yenamandra, Sriram and Gilitschenski, Igor and Batra, Dhruv and Szot, Andrew and Agrawal, Harsh},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2022},
  doi       = {10.1007/978-3-031-19842-7_21},
  url       = {https://mlanthology.org/eccv/2022/kant2022eccv-housekeep/}
}