Constrained Policy Improvement for Efficient Reinforcement Learning

Sarafian, Elad; Tamar, Aviv; Kraus, Sarit

doi:10.24963/IJCAI.2020/396

Constrained Policy Improvement for Efficient Reinforcement Learning

Elad Sarafian, Aviv Tamar, Sarit Kraus

IJCAI 2020 pp. 2863-2871

doi:10.24963/IJCAI.2020/396 /ijcai/2020/sarafian2020ijcai-constrained/

Abstract

We propose a policy improvement algorithm for Reinforcement Learning (RL) termed Rerouted Behavior Improvement (RBI). RBI is designed to take into account the evaluation errors of the Q-function. Such errors are common in RL when learning the Q-value from finite experience data. Greedy policies or even constrained policy optimization algorithms that ignore these errors may suffer from an improvement penalty (i.e., a policy impairment). To reduce the penalty, the idea of RBI is to attenuate rapid policy changes to actions that were rarely sampled. This approach is shown to avoid catastrophic performance degradation and reduce regret when learning from a batch of transition samples. Through a two-armed bandit example, we show that it also increases data efficiency when the optimal action has a high variance. We evaluate RBI in two tasks in the Atari Learning Environment: (1) learning from observations of multiple behavior policies and (2) iterative RL. Our results demonstrate the advantage of RBI over greedy policies and other constrained policy optimization algorithms both in learning from observations and in RL tasks.

PDF IJCAI Semantic Scholar

Cite

Text

Sarafian et al. "Constrained Policy Improvement for Efficient Reinforcement Learning." International Joint Conference on Artificial Intelligence, 2020. doi:10.24963/IJCAI.2020/396

Markdown

[Sarafian et al. "Constrained Policy Improvement for Efficient Reinforcement Learning." International Joint Conference on Artificial Intelligence, 2020.](https://mlanthology.org/ijcai/2020/sarafian2020ijcai-constrained/) doi:10.24963/IJCAI.2020/396

BibTeX

@inproceedings{sarafian2020ijcai-constrained,
  title     = {{Constrained Policy Improvement for Efficient Reinforcement Learning}},
  author    = {Sarafian, Elad and Tamar, Aviv and Kraus, Sarit},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2020},
  pages     = {2863-2871},
  doi       = {10.24963/IJCAI.2020/396},
  url       = {https://mlanthology.org/ijcai/2020/sarafian2020ijcai-constrained/}
}