Learning Object-Conditioned Exploration Using Distributed Soft Actor Critic

Abstract

Object navigation is defined as navigating to an object of a given label in a complex, unexplored environment. In its general form, this problem poses several challenges for Robotics: semantic exploration of unknown environments in search of an object and low-level control. In this work we study object-guided exploration and low-level control, and present an end-to-end trained navigation policy achieving a success rate of 0.68 and SPL of 0.58 on unseen, visually complex scans of real homes. We propose a highly scalable implementation of an off-policy Reinforcement Learning algorithm, distributed Soft Actor Critic, which allows the system to utilize 98M experience steps in 24 hours on 8 GPUs. Our system learns to control a differential drive mobile base in simulation from a stack of high dimensional observations commonly used on robotic platforms. The learned policy is capable of object-guided exploratory behaviors and low-level control learned from pure experiences in realistic environments.

Cite

Text

Wahid et al. "Learning Object-Conditioned Exploration Using Distributed Soft Actor Critic." Conference on Robot Learning, 2020.

Markdown

[Wahid et al. "Learning Object-Conditioned Exploration Using Distributed Soft Actor Critic." Conference on Robot Learning, 2020.](https://mlanthology.org/corl/2020/wahid2020corl-learning/)

BibTeX

@inproceedings{wahid2020corl-learning,
  title     = {{Learning Object-Conditioned Exploration Using Distributed Soft Actor Critic}},
  author    = {Wahid, Ayzaan and Stone, Austin and Chen, Kevin and Ichter, Brian and Toshev, Alexander},
  booktitle = {Conference on Robot Learning},
  year      = {2020},
  pages     = {1684-1695},
  volume    = {155},
  url       = {https://mlanthology.org/corl/2020/wahid2020corl-learning/}
}