Chain-of-Thought Predictive Control

Abstract

We study generalizable policy learning from demonstrations for complex low-level control (e.g., contact-rich object manipulations). We propose a novel hierarchical imitation learning method that utilizes sub-optimal demos. Firstly, we propose an observation space-agnostic approach that efficiently discovers the multi-step subskill decomposition of the demos in an unsupervised manner. By grouping temporarily close and functionally similar actions into subskill-level demo segments, the observations at the segment boundaries constitute a chain of planning steps for the task, which we refer to as the chain-of-thought (CoT). Next, we propose a Transformer-based design that effectively learns to predict the CoT as the subskill-level guidance. We couple action and subskill predictions via learnable prompt tokens and a hybrid masking strategy, which enable dynamically updated guidance at test time and improve feature representation of the trajectory for generalizable policy learning. Our method, Chain-of-Thought Predictive Control (CoTPC), consistently surpasses existing strong baselines on various challenging low-level manipulation tasks with sub-optimal demos. See project page at https://sites.google.com/view/cotpc.

Cite

Text

Jia et al. "Chain-of-Thought Predictive Control." International Conference on Machine Learning, 2024.

Markdown

[Jia et al. "Chain-of-Thought Predictive Control." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/jia2024icml-chainofthought/)

BibTeX

@inproceedings{jia2024icml-chainofthought,
  title     = {{Chain-of-Thought Predictive Control}},
  author    = {Jia, Zhiwei and Thumuluri, Vineet and Liu, Fangchen and Chen, Linghao and Huang, Zhiao and Su, Hao},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {21768-21790},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/jia2024icml-chainofthought/}
}