Chain-of-Imagination for Reliable Instruction Following in Decision Making

Abstract

Enabling the embodied agent to imagine step-by-step the future states and sequentially approach these situation-aware states can enhance its capability to make reliable action decisions from textual instructions. In this work, we introduce a simple but effective mechanism called Chain-of-Imagination (CoI), which repeatedly employs a Multimodal Large Language Model (MLLM) equipped diffusion model to facilitate imagining and acting upon the series of intermediate situation-aware visual sub-goals one by one, resulting in more reliable instruction-following capability. Based on the CoI mechanism, we propose an embodied agent DecisionDreamer as the low-level controller that can be adapted to different open-world scenarios. Extensive experiments demonstrate that DecisionDreamer can achieve more reliable and accurate decision-making and significantly outperform the state-of-the-art generalist agents in the Minecraft and CALVIN sandbox simulators, regarding the instruction-following capability. For more demos, please see https://sites.google.com/view/decisiondreamer.

Cite

Text

Zhou et al. "Chain-of-Imagination for Reliable Instruction Following in Decision Making." NeurIPS 2024 Workshops: OWA, 2024.

Markdown

[Zhou et al. "Chain-of-Imagination for Reliable Instruction Following in Decision Making." NeurIPS 2024 Workshops: OWA, 2024.](https://mlanthology.org/neuripsw/2024/zhou2024neuripsw-chainofimagination/)

BibTeX

@inproceedings{zhou2024neuripsw-chainofimagination,
  title     = {{Chain-of-Imagination for Reliable Instruction Following in Decision Making}},
  author    = {Zhou, Enshen and Qin, Yiran and Yin, Zhenfei and Huang, Yuzhou and Zhang, Ruimao and Sheng, Lu and Qiao, Yu and Shao, Jing},
  booktitle = {NeurIPS 2024 Workshops: OWA},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/zhou2024neuripsw-chainofimagination/}
}