Authorial Idioms for Target Distributions in TTD-MDPs
Abstract
In designing Markov Decision Processes (MDP), one must define the world, its dynamics, a set of actions, and a reward function. MDPs are often applied in situations where there is a clear choice of reward functions and in these cases significant care must be taken to construct a reward function that induces the desired behavior. In this paper, we consider an analogous design problem: crafting a target distribution in Targeted Trajectory Distribution MDPs (TTD-MDPs). TTD-MDPs produce probabilistic policies that minimize divergence from a target distribution of trajectories from an underlying MDP. They are an extension of MDPs that provide variety of experience during repeated execution. Here, we present a brief overview of TTD-MDPs with approaches for constructing target distributions. Then we present a novel authorial idiom for creating target distributions using prototype trajectories. We evaluate these approaches on a drama manager for an interactive game.
Cite
Text
Roberts et al. "Authorial Idioms for Target Distributions in TTD-MDPs." AAAI Conference on Artificial Intelligence, 2007.Markdown
[Roberts et al. "Authorial Idioms for Target Distributions in TTD-MDPs." AAAI Conference on Artificial Intelligence, 2007.](https://mlanthology.org/aaai/2007/roberts2007aaai-authorial/)BibTeX
@inproceedings{roberts2007aaai-authorial,
title = {{Authorial Idioms for Target Distributions in TTD-MDPs}},
author = {Roberts, David L. and Bhat, Sooraj and Clair, Kenneth St. and Jr., Charles Lee Isbell},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2007},
pages = {852-857},
url = {https://mlanthology.org/aaai/2007/roberts2007aaai-authorial/}
}