Action-Constrained Markov Decision Processes with Kullback-Leibler Cost

Abstract

This paper concerns computation of optimal policies in which the one-step cost function contains a term that models Kullback-Leibler divergence with respect to nominal dynamics. This technique was introduced by Todorov in 2007, where it was shown under general conditions that the solution to the average-cost optimality equations reduce to a simple eigenvector problem. Since then many authors have sought to apply this technique to control problems and models of bounded rationality in economics. A crucial assumption is that the input process is essentially unconstrained. For example, if the nominal dynamics include randomness from nature (e.g., the impact of wind on a moving vehicle), then the optimal control solution does not respect the exogenous nature of this disturbance. This paper introduces a technique to solve a more general class of action-constrained MDPs. The main idea is to solve an entire parameterized family of MDPs, in which the parameter is a scalar weighting the one-step cost or reward function. The approach is new and practical even in the original unconstrained formulation.

Cite

Text

Busic and Meyn. "Action-Constrained Markov Decision Processes with Kullback-Leibler Cost." Annual Conference on Computational Learning Theory, 2018.

Markdown

[Busic and Meyn. "Action-Constrained Markov Decision Processes with Kullback-Leibler Cost." Annual Conference on Computational Learning Theory, 2018.](https://mlanthology.org/colt/2018/busic2018colt-action/)

BibTeX

@inproceedings{busic2018colt-action,
  title     = {{Action-Constrained Markov Decision Processes with Kullback-Leibler Cost}},
  author    = {Busic, Ana and Meyn, Sean P.},
  booktitle = {Annual Conference on Computational Learning Theory},
  year      = {2018},
  pages     = {1431-1444},
  url       = {https://mlanthology.org/colt/2018/busic2018colt-action/}
}