Guided Exploration in Reinforcement Learning via Monte Carlo Critic Optimization

Abstract

The class of deep deterministic off-policy algorithms is effectively applied to solve challenging continuous control problems. However, current approaches use random noise as a common exploration method that has several weaknesses, such as a need for manual adjusting on a given task and the absence of exploratory calibration during the training process. We address these challenges by proposing a novel guided exploration method that uses a differential directional controller to incorporate scalable exploratory action correction. An ensemble of Monte Carlo Critics that provides exploratory direction is presented as a controller. The proposed method improves the traditional exploration scheme by changing exploration dynamically. We then present a novel algorithm exploiting the proposed directional controller for both policy and critic modification. The presented algorithm outperforms modern algorithms across a variety of problems from DMControl suite.

Cite

Text

Kuznetsov. "Guided Exploration in Reinforcement Learning via Monte Carlo Critic Optimization." ICML 2022 Workshops: DARL, 2022.

Markdown

[Kuznetsov. "Guided Exploration in Reinforcement Learning via Monte Carlo Critic Optimization." ICML 2022 Workshops: DARL, 2022.](https://mlanthology.org/icmlw/2022/kuznetsov2022icmlw-guided/)

BibTeX

@inproceedings{kuznetsov2022icmlw-guided,
  title     = {{Guided Exploration in Reinforcement Learning via Monte Carlo Critic Optimization}},
  author    = {Kuznetsov, Igor},
  booktitle = {ICML 2022 Workshops: DARL},
  year      = {2022},
  url       = {https://mlanthology.org/icmlw/2022/kuznetsov2022icmlw-guided/}
}