Stochastic Policy Optimization with Heuristic Information for Robot Learning

Abstract

Stochastic policy-based deep reinforcement learning (RL) approaches have remarkably succeeded to deal with continuous control tasks. However, applying these methods to manipulation tasks remains a challenge since actuators of a robot manipulator require high dimensional continuous action spaces. In this paper, we propose exploration-bounded exploration actor-critic (EBE-AC), a novel deep RL approach to combine stochastic policy optimization with interpretable human knowledge. The human knowledge is defined as heuristic information based on both physical relationships between a robot and objects and binary signals of whether the robot has achieved certain states. The proposed approach, EBE-AC, combines an off-policy actor-critic algorithm with an entropy maximization based on the heuristic information. On a robotic manipulation task, we demonstrate that EBE-AC outperforms prior state-of-the-art off-policy actor-critic deep RL algorithms in terms of sample efficiency. In addition, we found that EBE-AC can be easily combined with latent information, where EBE-AC with latent information further improved sample efficiency and robustness.

Cite

Text

Kim et al. "Stochastic Policy Optimization with Heuristic Information for Robot Learning." Conference on Robot Learning, 2021.

Markdown

[Kim et al. "Stochastic Policy Optimization with Heuristic Information for Robot Learning." Conference on Robot Learning, 2021.](https://mlanthology.org/corl/2021/kim2021corl-stochastic/)

BibTeX

@inproceedings{kim2021corl-stochastic,
  title     = {{Stochastic Policy Optimization with Heuristic Information for Robot Learning}},
  author    = {Kim, Seonghyun and Jang, Ingook and Noh, Samyeul and Kim, Hyunseok},
  booktitle = {Conference on Robot Learning},
  year      = {2021},
  pages     = {1465-1474},
  volume    = {164},
  url       = {https://mlanthology.org/corl/2021/kim2021corl-stochastic/}
}