Entity-Centric Reinforcement Learning for Object Manipulation from Pixels

Abstract

Manipulating objects is a hallmark of human intelligence, and an important task in domains such as robotics. In principle, Reinforcement Learning (RL) offers a general approach to learn object manipulation. In practice, however, domains with more than a few objects are difficult for RL agents due to the curse of dimensionality, especially when learning from raw image observations. In this work we propose a structured approach for visual RL that is suitable for representing multiple objects and their interaction, and use it to learn goal-conditioned manipulation of several objects. Key to our method is the ability to handle goals with dependencies between the objects (e.g., moving objects in a certain order). We further relate our architecture to the generalization capability of the trained agent, and demonstrate agents that learn with 3 objects but generalize to similar tasks with over 10 objects. Rollout videos are available on our website: https://sites.google.com/view/entity-centric-rl

Cite

Text

Haramati et al. "Entity-Centric Reinforcement Learning for Object Manipulation from Pixels." NeurIPS 2023 Workshops: GCRL, 2023.

Markdown

[Haramati et al. "Entity-Centric Reinforcement Learning for Object Manipulation from Pixels." NeurIPS 2023 Workshops: GCRL, 2023.](https://mlanthology.org/neuripsw/2023/haramati2023neuripsw-entitycentric/)

BibTeX

@inproceedings{haramati2023neuripsw-entitycentric,
  title     = {{Entity-Centric Reinforcement Learning for Object Manipulation from Pixels}},
  author    = {Haramati, Dan and Daniel, Tal and Tamar, Aviv},
  booktitle = {NeurIPS 2023 Workshops: GCRL},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/haramati2023neuripsw-entitycentric/}
}