Value Functions for RL-Based Behavior Transfer: A Comparative Study

Abstract

Temporal difference (TD) learning methods (Sutton & Barto 1998) have become popular reinforcement learning techniques in recent years. TD methods, relying on function approxima-tors to generalize learning to novel situations, have had some experimental successes and have been shown to exhibit some desirable properties in theory, but have often been found slow in practice. This paper presents methods for further generaliz-ing across tasks, thereby speeding up learning, via a novel form of behavior transfer. We compare learning on a complex task with three function approximators, a CMAC, a neural network, and an RBF, and demonstrate that behavior transfer works well with all three. Using behavior transfer, agents are able to learn one task and then markedly reduce the time it takes to learn a more complex task. Our algorithms are fully implemented and tested in the RoboCup-soccer keepaway domain.

Cite

Text

Taylor et al. "Value Functions for RL-Based Behavior Transfer: A Comparative Study." AAAI Conference on Artificial Intelligence, 2005.

Markdown

[Taylor et al. "Value Functions for RL-Based Behavior Transfer: A Comparative Study." AAAI Conference on Artificial Intelligence, 2005.](https://mlanthology.org/aaai/2005/taylor2005aaai-value/)

BibTeX

@inproceedings{taylor2005aaai-value,
  title     = {{Value Functions for RL-Based Behavior Transfer: A Comparative Study}},
  author    = {Taylor, Matthew E. and Stone, Peter and Liu, Yaxin},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2005},
  pages     = {880-885},
  url       = {https://mlanthology.org/aaai/2005/taylor2005aaai-value/}
}