Targeting Specific Distributions of Trajectories in MDPs

Roberts, David L.; Nelson, Mark J.; Jr., Charles Lee Isbell; Mateas, Michael; Littman, Michael L.

Targeting Specific Distributions of Trajectories in MDPs

David L. Roberts, Mark J. Nelson, Charles Lee Isbell Jr., Michael Mateas, Michael L. Littman

AAAI 2006 pp. 1213-1218

/aaai/2006/roberts2006aaai-targeting/

Abstract

We define TTD-MDPs, a novel class of Markov decision processes where the traditional goal of an agent is changed from finding an optimal trajectory through a state space to realizing a specified distribution of trajectories through the space. After motivating this formulation, we show how to convert a traditional MDP into a TTD-MDP. We derive an algorithm for finding non-deterministic policies by constructing a trajectory tree that allows us to compute locally-consistent policies. We specify the necessary conditions for solving the problem exactly and present a heuristic algorithm for constructing policies when an exact answer is impossible or impractical. We present empirical results for our algorithm in two domains: a synthetic grid world and stories in an interactive drama or game.

PDF AAAI Semantic Scholar

Cite

Text

Roberts et al. "Targeting Specific Distributions of Trajectories in MDPs." AAAI Conference on Artificial Intelligence, 2006.

Markdown

[Roberts et al. "Targeting Specific Distributions of Trajectories in MDPs." AAAI Conference on Artificial Intelligence, 2006.](https://mlanthology.org/aaai/2006/roberts2006aaai-targeting/)

BibTeX

@inproceedings{roberts2006aaai-targeting,
  title     = {{Targeting Specific Distributions of Trajectories in MDPs}},
  author    = {Roberts, David L. and Nelson, Mark J. and Jr., Charles Lee Isbell and Mateas, Michael and Littman, Michael L.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2006},
  pages     = {1213-1218},
  url       = {https://mlanthology.org/aaai/2006/roberts2006aaai-targeting/}
}