Theoretical Analysis of Efficiency and Robustness of SoftMax and Gap-Increasing Operators in Reinforcement Learning
Abstract
In this paper, we propose and analyze conservative value iteration, which unifies value iteration, soft value iteration, advantage learning, and dynamic policy programming. Our analysis shows that algorithms using a combination of gap-increasing and max operators are resilient to stochastic errors, but not to non-stochastic errors. In contrast, algorithms using a softmax operator without a gap-increasing operator are less susceptible to all types of errors, but may display poor asymptotic performance. Algorithms using a combination of gap-increasing and softmax operators are much more effective and may asymptotically outperform algorithms with the max operator. Not only do these theoretical results provide a deep understanding of various reinforcement learning algorithms, but they also highlight the effectiveness of gap-increasing operators, as well as the limitations of traditional greedy value updates by the max operator.
Cite
Text
Kozuno et al. "Theoretical Analysis of Efficiency and Robustness of SoftMax and Gap-Increasing Operators in Reinforcement Learning." Artificial Intelligence and Statistics, 2019.Markdown
[Kozuno et al. "Theoretical Analysis of Efficiency and Robustness of SoftMax and Gap-Increasing Operators in Reinforcement Learning." Artificial Intelligence and Statistics, 2019.](https://mlanthology.org/aistats/2019/kozuno2019aistats-theoretical/)BibTeX
@inproceedings{kozuno2019aistats-theoretical,
title = {{Theoretical Analysis of Efficiency and Robustness of SoftMax and Gap-Increasing Operators in Reinforcement Learning}},
author = {Kozuno, Tadashi and Uchibe, Eiji and Doya, Kenji},
booktitle = {Artificial Intelligence and Statistics},
year = {2019},
pages = {2995-3003},
volume = {89},
url = {https://mlanthology.org/aistats/2019/kozuno2019aistats-theoretical/}
}