Stationary Deterministic Policies for Constrained MDPs with Multiple Rewards, Costs, and Discount Factors

Abstract

We consider the problem of policy optimization for a resource-limited agent with multiple timedependent objectives, represented as an MDP with multiple discount factors in the objective function and constraints. We show that limiting search to stationary deterministic policies, coupled with a novel problem reduction to mixed integer programming, yields an algorithm for finding such policies that is computationally feasible, where no such algorithm has heretofore been identified. In the simpler case where the constrained MDP has a single discount factor, our technique provides a new way for finding an optimal deterministic policy, where previous methods could only find randomized policies. We analyze the properties of our approach and describe implementation results. 1

Cite

Text

Dolgov and Durfee. "Stationary Deterministic Policies for Constrained MDPs with Multiple Rewards, Costs, and Discount Factors." International Joint Conference on Artificial Intelligence, 2005.

Markdown

[Dolgov and Durfee. "Stationary Deterministic Policies for Constrained MDPs with Multiple Rewards, Costs, and Discount Factors." International Joint Conference on Artificial Intelligence, 2005.](https://mlanthology.org/ijcai/2005/dolgov2005ijcai-stationary/)

BibTeX

@inproceedings{dolgov2005ijcai-stationary,
  title     = {{Stationary Deterministic Policies for Constrained MDPs with Multiple Rewards, Costs, and Discount Factors}},
  author    = {Dolgov, Dmitri A. and Durfee, Edmund H.},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2005},
  pages     = {1326-1331},
  url       = {https://mlanthology.org/ijcai/2005/dolgov2005ijcai-stationary/}
}