Stochastic Policy Optimization with Heuristic Information for Robot Learning

Seonghyun Kim, Ingook Jang, Samyeul Noh, Hyunseok Kim

CoRL 2021 pp. 1465-1474

/corl/2021/kim2021corl-stochastic/

Abstract

Stochastic policy-based deep reinforcement learning (RL) approaches have remarkably succeeded to deal with continuous control tasks. However, applying these methods to manipulation tasks remains a challenge since actuators of a robot manipulator require high dimensional continuous action spaces. In this paper, we propose exploration-bounded exploration actor-critic (EBE-AC), a novel deep RL approach to combine stochastic policy optimization with interpretable human knowledge. The human knowledge is defined as heuristic information based on both physical relationships between a robot and objects and binary signals of whether the robot has achieved certain states. The proposed approach, EBE-AC, combines an off-policy actor-critic algorithm with an entropy maximization based on the heuristic information. On a robotic manipulation task, we demonstrate that EBE-AC outperforms prior state-of-the-art off-policy actor-critic deep RL algorithms in terms of sample efficiency. In addition, we found that EBE-AC can be easily combined with latent information, where EBE-AC with latent information further improved sample efficiency and robustness.

PDF CoRL OpenReview Semantic Scholar

Cite

Text

Kim et al. "Stochastic Policy Optimization with Heuristic Information for Robot Learning." Conference on Robot Learning, 2021.

Markdown

[Kim et al. "Stochastic Policy Optimization with Heuristic Information for Robot Learning." Conference on Robot Learning, 2021.](https://mlanthology.org/corl/2021/kim2021corl-stochastic/)

BibTeX

@inproceedings{kim2021corl-stochastic,
  title     = {{Stochastic Policy Optimization with Heuristic Information for Robot Learning}},
  author    = {Kim, Seonghyun and Jang, Ingook and Noh, Samyeul and Kim, Hyunseok},
  booktitle = {Conference on Robot Learning},
  year      = {2021},
  pages     = {1465-1474},
  volume    = {164},
  url       = {https://mlanthology.org/corl/2021/kim2021corl-stochastic/}
}