Stochastic Policy Optimization with Heuristic Information for Robot Learning
Abstract
Stochastic policy-based deep reinforcement learning (RL) approaches have remarkably succeeded to deal with continuous control tasks. However, applying these methods to manipulation tasks remains a challenge since actuators of a robot manipulator require high dimensional continuous action spaces. In this paper, we propose exploration-bounded exploration actor-critic (EBE-AC), a novel deep RL approach to combine stochastic policy optimization with interpretable human knowledge. The human knowledge is defined as heuristic information based on both physical relationships between a robot and objects and binary signals of whether the robot has achieved certain states. The proposed approach, EBE-AC, combines an off-policy actor-critic algorithm with an entropy maximization based on the heuristic information. On a robotic manipulation task, we demonstrate that EBE-AC outperforms prior state-of-the-art off-policy actor-critic deep RL algorithms in terms of sample efficiency. In addition, we found that EBE-AC can be easily combined with latent information, where EBE-AC with latent information further improved sample efficiency and robustness.
Cite
Text
Kim et al. "Stochastic Policy Optimization with Heuristic Information for Robot Learning." Conference on Robot Learning, 2021.Markdown
[Kim et al. "Stochastic Policy Optimization with Heuristic Information for Robot Learning." Conference on Robot Learning, 2021.](https://mlanthology.org/corl/2021/kim2021corl-stochastic/)BibTeX
@inproceedings{kim2021corl-stochastic,
title = {{Stochastic Policy Optimization with Heuristic Information for Robot Learning}},
author = {Kim, Seonghyun and Jang, Ingook and Noh, Samyeul and Kim, Hyunseok},
booktitle = {Conference on Robot Learning},
year = {2021},
pages = {1465-1474},
volume = {164},
url = {https://mlanthology.org/corl/2021/kim2021corl-stochastic/}
}