Action-Constrained Markov Decision Processes with Kullback-Leibler Cost
Abstract
This paper concerns computation of optimal policies in which the one-step cost function contains a term that models Kullback-Leibler divergence with respect to nominal dynamics. This technique was introduced by Todorov in 2007, where it was shown under general conditions that the solution to the average-cost optimality equations reduce to a simple eigenvector problem. Since then many authors have sought to apply this technique to control problems and models of bounded rationality in economics. A crucial assumption is that the input process is essentially unconstrained. For example, if the nominal dynamics include randomness from nature (e.g., the impact of wind on a moving vehicle), then the optimal control solution does not respect the exogenous nature of this disturbance. This paper introduces a technique to solve a more general class of action-constrained MDPs. The main idea is to solve an entire parameterized family of MDPs, in which the parameter is a scalar weighting the one-step cost or reward function. The approach is new and practical even in the original unconstrained formulation.
Cite
Text
Busic and Meyn. "Action-Constrained Markov Decision Processes with Kullback-Leibler Cost." Annual Conference on Computational Learning Theory, 2018.Markdown
[Busic and Meyn. "Action-Constrained Markov Decision Processes with Kullback-Leibler Cost." Annual Conference on Computational Learning Theory, 2018.](https://mlanthology.org/colt/2018/busic2018colt-action/)BibTeX
@inproceedings{busic2018colt-action,
title = {{Action-Constrained Markov Decision Processes with Kullback-Leibler Cost}},
author = {Busic, Ana and Meyn, Sean P.},
booktitle = {Annual Conference on Computational Learning Theory},
year = {2018},
pages = {1431-1444},
url = {https://mlanthology.org/colt/2018/busic2018colt-action/}
}