Policy Tree: Adaptive Representation for Policy Gradient

Ujjwal Das Gupta, Erik Talvitie, Michael Bowling

AAAI 2015 pp. 2547-2553

doi:10.1609/AAAI.V29I1.9613 /aaai/2015/gupta2015aaai-policy/

Abstract

Much of the focus on finding good representations in reinforcement learning has been on learning complex non-linear predictors of value. Policy gradient algorithms, which directly represent the policy, often need fewer parameters to learn good policies. However, they typically employ a fixed parametric representation that may not be sufficient for complex domains. This paper introduces the Policy Tree algorithm, which can learn an adaptive representation of policy in the form of a decision tree over different instantiations of a base policy. Policy gradient is used both to optimize the parameters and to grow the tree by choosing splits that enable the maximum local increase in the expected return of the policy. Experiments show that this algorithm can choose genuinely helpful splits and significantly improve upon the commonly used linear Gibbs softmax policy, which we choose as our base policy.

PDF AAAI Semantic Scholar

Cite

Text

Das Gupta et al. "Policy Tree: Adaptive Representation for Policy Gradient." AAAI Conference on Artificial Intelligence, 2015. doi:10.1609/AAAI.V29I1.9613

Markdown

[Das Gupta et al. "Policy Tree: Adaptive Representation for Policy Gradient." AAAI Conference on Artificial Intelligence, 2015.](https://mlanthology.org/aaai/2015/gupta2015aaai-policy/) doi:10.1609/AAAI.V29I1.9613

BibTeX

@inproceedings{gupta2015aaai-policy,
  title     = {{Policy Tree: Adaptive Representation for Policy Gradient}},
  author    = {Das Gupta, Ujjwal and Talvitie, Erik and Bowling, Michael},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2015},
  pages     = {2547-2553},
  doi       = {10.1609/AAAI.V29I1.9613},
  url       = {https://mlanthology.org/aaai/2015/gupta2015aaai-policy/}
}