Optimal Decision Tree Policies for Markov Decision Processes

IJCAI 2023 pp. 5457-5465

doi:10.24963/IJCAI.2023/606 /ijcai/2023/vos2023ijcai-optimal/

Abstract

Interpretability of reinforcement learning policies is essential for many real-world tasks but learning such interpretable policies is a hard problem. Particularly, rule-based policies such as decision trees and rules lists are difficult to optimize due to their non-differentiability. While existing techniques can learn verifiable decision tree policies, there is no guarantee that the learners generate a policy that performs optimally. In this work, we study the optimization of size-limited decision trees for Markov Decision Processes (MPDs) and propose OMDTs: Optimal MDP Decision Trees. Given a user-defined size limit and MDP formulation, OMDT directly maximizes the expected discounted return for the decision tree using Mixed-Integer Linear Programming. By training optimal tree policies for different MDPs we empirically study the optimality gap for existing imitation learning techniques and find that they perform sub-optimally. We show that this is due to an inherent shortcoming of imitation learning, namely that complex policies cannot be represented using size-limited trees. In such cases, it is better to directly optimize the tree for expected return. While there is generally a trade-off between the performance and interpretability of machine learning models, we find that on small MDPs, depth 3 OMDTs often perform close to optimally.

PDF IJCAI Semantic Scholar

Cite

Text

Vos and Verwer. "Optimal Decision Tree Policies for Markov Decision Processes." International Joint Conference on Artificial Intelligence, 2023. doi:10.24963/IJCAI.2023/606

Markdown

[Vos and Verwer. "Optimal Decision Tree Policies for Markov Decision Processes." International Joint Conference on Artificial Intelligence, 2023.](https://mlanthology.org/ijcai/2023/vos2023ijcai-optimal/) doi:10.24963/IJCAI.2023/606

BibTeX

@inproceedings{vos2023ijcai-optimal,
  title     = {{Optimal Decision Tree Policies for Markov Decision Processes}},
  author    = {Vos, Daniël and Verwer, Sicco},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {5457-5465},
  doi       = {10.24963/IJCAI.2023/606},
  url       = {https://mlanthology.org/ijcai/2023/vos2023ijcai-optimal/}
}