Learning to Manipulate Anywhere: A Visual Generalizable Framework for Reinforcement Learning

Abstract

Can we endow visuomotor robots with generalization capabilities to operate in diverse open-world scenarios? In this paper, we propose Maniwhere, a generalizable framework tailored for visual reinforcement learning, enabling the trained robot policies to generalize across a combination of multiple visual disturbance types. Specifically, we introduce a multi-view representation learning approach fused with Spatial Transformer Network (STN) module to capture shared semantic information and correspondences among different viewpoints. In addition, we employ a curriculum-based randomization and augmentation approach to stabilize the RL training process and strengthen the visual generalization ability. To exhibit the effectiveness of Maniwhere, we meticulously design **8** tasks encompassing articulate objects, bi-manual, and dexterous hand manipulation tasks, demonstrating Maniwhere’s strong visual generalization and sim2real transfer abilities across **3** hardware platforms. Our experiments show that Maniwhere significantly outperforms existing state-of-the-art methods. Videos are provided at https://maniwhere.github.io.

Cite

Text

Yuan et al. "Learning to Manipulate Anywhere: A Visual Generalizable Framework for Reinforcement Learning." Proceedings of The 8th Conference on Robot Learning, 2024.

Markdown

[Yuan et al. "Learning to Manipulate Anywhere: A Visual Generalizable Framework for Reinforcement Learning." Proceedings of The 8th Conference on Robot Learning, 2024.](https://mlanthology.org/corl/2024/yuan2024corl-learning/)

BibTeX

@inproceedings{yuan2024corl-learning,
  title     = {{Learning to Manipulate Anywhere: A Visual Generalizable Framework for Reinforcement Learning}},
  author    = {Yuan, Zhecheng and Wei, Tianming and Cheng, Shuiqi and Zhang, Gu and Chen, Yuanpei and Xu, Huazhe},
  booktitle = {Proceedings of The 8th Conference on Robot Learning},
  year      = {2024},
  pages     = {1815-1833},
  volume    = {270},
  url       = {https://mlanthology.org/corl/2024/yuan2024corl-learning/}
}