Targeting Specific Distributions of Trajectories in MDPs
Abstract
We define TTD-MDPs, a novel class of Markov decision processes where the traditional goal of an agent is changed from finding an optimal trajectory through a state space to realizing a specified distribution of trajectories through the space. After motivating this formulation, we show how to convert a traditional MDP into a TTD-MDP. We derive an algorithm for finding non-deterministic policies by constructing a trajectory tree that allows us to compute locally-consistent policies. We specify the necessary conditions for solving the problem exactly and present a heuristic algorithm for constructing policies when an exact answer is impossible or impractical. We present empirical results for our algorithm in two domains: a synthetic grid world and stories in an interactive drama or game.
Cite
Text
Roberts et al. "Targeting Specific Distributions of Trajectories in MDPs." AAAI Conference on Artificial Intelligence, 2006.Markdown
[Roberts et al. "Targeting Specific Distributions of Trajectories in MDPs." AAAI Conference on Artificial Intelligence, 2006.](https://mlanthology.org/aaai/2006/roberts2006aaai-targeting/)BibTeX
@inproceedings{roberts2006aaai-targeting,
title = {{Targeting Specific Distributions of Trajectories in MDPs}},
author = {Roberts, David L. and Nelson, Mark J. and Jr., Charles Lee Isbell and Mateas, Michael and Littman, Michael L.},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2006},
pages = {1213-1218},
url = {https://mlanthology.org/aaai/2006/roberts2006aaai-targeting/}
}