Automatic Shaping and Decomposition of Reward Functions

Abstract

This paper investigates the problem of automatically learning how to restructure the reward function of a Markov decision process so as to speed up reinforcement learning. We begin by describing a method that learns a shaped reward function given a set of state and temporal abstractions. Next, we consider decomposition of the per-timestep reward in multieffector problems, in which the overall agent can be decomposed into multiple units that are concurrently carrying out various tasks. We show by example that to find a good reward decomposition, it is often necessary to first shape the rewards appropriately. We then give a function approximation algorithm for solving both problems together. Standard reinforcement learning algorithms can be augmented with our methods, and we show experimentally that in each case, significantly faster learning results.

Cite

Text

Marthi. "Automatic Shaping and Decomposition of Reward Functions." International Conference on Machine Learning, 2007. doi:10.1145/1273496.1273572

Markdown

[Marthi. "Automatic Shaping and Decomposition of Reward Functions." International Conference on Machine Learning, 2007.](https://mlanthology.org/icml/2007/marthi2007icml-automatic/) doi:10.1145/1273496.1273572

BibTeX

@inproceedings{marthi2007icml-automatic,
  title     = {{Automatic Shaping and Decomposition of Reward Functions}},
  author    = {Marthi, Bhaskara},
  booktitle = {International Conference on Machine Learning},
  year      = {2007},
  pages     = {601-608},
  doi       = {10.1145/1273496.1273572},
  url       = {https://mlanthology.org/icml/2007/marthi2007icml-automatic/}
}