Authorial Idioms for Target Distributions in TTD-MDPs

Abstract

In designing Markov Decision Processes (MDP), one must define the world, its dynamics, a set of actions, and a reward function. MDPs are often applied in situations where there is a clear choice of reward functions and in these cases significant care must be taken to construct a reward function that induces the desired behavior. In this paper, we consider an analogous design problem: crafting a target distribution in Targeted Trajectory Distribution MDPs (TTD-MDPs). TTD-MDPs produce probabilistic policies that minimize divergence from a target distribution of trajectories from an underlying MDP. They are an extension of MDPs that provide variety of experience during repeated execution. Here, we present a brief overview of TTD-MDPs with approaches for constructing target distributions. Then we present a novel authorial idiom for creating target distributions using prototype trajectories. We evaluate these approaches on a drama manager for an interactive game.

Cite

Text

Roberts et al. "Authorial Idioms for Target Distributions in TTD-MDPs." AAAI Conference on Artificial Intelligence, 2007.

Markdown

[Roberts et al. "Authorial Idioms for Target Distributions in TTD-MDPs." AAAI Conference on Artificial Intelligence, 2007.](https://mlanthology.org/aaai/2007/roberts2007aaai-authorial/)

BibTeX

@inproceedings{roberts2007aaai-authorial,
  title     = {{Authorial Idioms for Target Distributions in TTD-MDPs}},
  author    = {Roberts, David L. and Bhat, Sooraj and Clair, Kenneth St. and Jr., Charles Lee Isbell},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2007},
  pages     = {852-857},
  url       = {https://mlanthology.org/aaai/2007/roberts2007aaai-authorial/}
}