Monte Carlo Tree Search for Policy Optimization

Xiaobai Ma, Katherine Rose Driggs-Campbell, Zongzhang Zhang, Mykel J. Kochenderfer

IJCAI 2019 pp. 3116-3122

doi:10.24963/IJCAI.2019/432 /ijcai/2019/ma2019ijcai-monte/

Abstract

Gradient-based methods are often used for policy optimization in deep reinforcement learning, despite being vulnerable to local optima and saddle points. Although gradient-free methods (e.g., genetic algorithms or evolution strategies) help mitigate these issues, poor initialization and local optima are still concerns in highly nonconvex spaces. This paper presents a method for policy optimization based on Monte-Carlo tree search and gradient-free optimization. Our method, called Monte-Carlo tree search for policy optimization (MCTSPO), provides a better exploration-exploitation trade-off through the use of the upper confidence bound heuristic. We demonstrate improved performance on reinforcement learning tasks with deceptive or sparse reward functions compared to popular gradient-based and deep genetic algorithm baselines.

PDF IJCAI Semantic Scholar

Cite

Text

Ma et al. "Monte Carlo Tree Search for Policy Optimization." International Joint Conference on Artificial Intelligence, 2019. doi:10.24963/IJCAI.2019/432

Markdown

[Ma et al. "Monte Carlo Tree Search for Policy Optimization." International Joint Conference on Artificial Intelligence, 2019.](https://mlanthology.org/ijcai/2019/ma2019ijcai-monte/) doi:10.24963/IJCAI.2019/432

BibTeX

@inproceedings{ma2019ijcai-monte,
  title     = {{Monte Carlo Tree Search for Policy Optimization}},
  author    = {Ma, Xiaobai and Driggs-Campbell, Katherine Rose and Zhang, Zongzhang and Kochenderfer, Mykel J.},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2019},
  pages     = {3116-3122},
  doi       = {10.24963/IJCAI.2019/432},
  url       = {https://mlanthology.org/ijcai/2019/ma2019ijcai-monte/}
}