Targeting Specific Distributions of Trajectories in MDPs

Abstract

We define TTD-MDPs, a novel class of Markov decision processes where the traditional goal of an agent is changed from finding an optimal trajectory through a state space to realizing a specified distribution of trajectories through the space. After motivating this formulation, we show how to convert a traditional MDP into a TTD-MDP. We derive an algorithm for finding non-deterministic policies by constructing a trajectory tree that allows us to compute locally-consistent policies. We specify the necessary conditions for solving the problem exactly and present a heuristic algorithm for constructing policies when an exact answer is impossible or impractical. We present empirical results for our algorithm in two domains: a synthetic grid world and stories in an interactive drama or game.

Cite

Text

Roberts et al. "Targeting Specific Distributions of Trajectories in MDPs." AAAI Conference on Artificial Intelligence, 2006.

Markdown

[Roberts et al. "Targeting Specific Distributions of Trajectories in MDPs." AAAI Conference on Artificial Intelligence, 2006.](https://mlanthology.org/aaai/2006/roberts2006aaai-targeting/)

BibTeX

@inproceedings{roberts2006aaai-targeting,
  title     = {{Targeting Specific Distributions of Trajectories in MDPs}},
  author    = {Roberts, David L. and Nelson, Mark J. and Jr., Charles Lee Isbell and Mateas, Michael and Littman, Michael L.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2006},
  pages     = {1213-1218},
  url       = {https://mlanthology.org/aaai/2006/roberts2006aaai-targeting/}
}