InterACT: Inter-Dependency Aware Action Chunking with Hierarchical Attention Transformers for Bimanual Manipulation
Abstract
We present InterACT: Inter-dependency aware Action Chunking with Hierarchical Attention Transformers, a novel imitation learning framework for bimanual manipulation that integrates hierarchical attention to capture inter-dependencies between dual-arm joint states and visual inputs. InterACT consists of a Hierarchical Attention Encoder and a Multi-arm Decoder, both designed to enhance information aggregation and coordination. The encoder processes multi-modal inputs through segment-wise and cross-segment attention mechanisms, while the decoder leverages synchronization blocks to refine individual action predictions, providing the counterpart’s prediction as context. Our experiments on a variety of simulated and real-world bimanual manipulation tasks demonstrate that InterACT significantly outperforms existing methods. Detailed ablation studies validate the contributions of key components of our work, including the impact of CLS tokens, cross-segment encoders, and synchronization blocks.
Cite
Text
Lee et al. "InterACT: Inter-Dependency Aware Action Chunking with Hierarchical Attention Transformers for Bimanual Manipulation." Proceedings of The 8th Conference on Robot Learning, 2024.Markdown
[Lee et al. "InterACT: Inter-Dependency Aware Action Chunking with Hierarchical Attention Transformers for Bimanual Manipulation." Proceedings of The 8th Conference on Robot Learning, 2024.](https://mlanthology.org/corl/2024/lee2024corl-interact/)BibTeX
@inproceedings{lee2024corl-interact,
title = {{InterACT: Inter-Dependency Aware Action Chunking with Hierarchical Attention Transformers for Bimanual Manipulation}},
author = {Lee, Andrew Choong-Won and Chuang, Ian and Chen, Ling-Yuan and Soltani, Iman},
booktitle = {Proceedings of The 8th Conference on Robot Learning},
year = {2024},
pages = {1730-1743},
volume = {270},
url = {https://mlanthology.org/corl/2024/lee2024corl-interact/}
}