Complexity and Cooperation in Q-Learning
Abstract
This chapter describes two cooperative learning algorithms that can reduce search and decouple the learning rate from state-space size. The first algorithm, called Learning with an External Critic (LEC), is based on the idea of a mentor who watches the learner and generates immediate rewards in response to its most recent actions. This reward is then used temporarily to bias the learner's control strategy. The second algorithm, called Learning By Watching ( LBW), is based on the idea that an agent can gain experience vicariously by relating the observed behavior of others to its own. While LEC algorithms require interaction with knowledgeable agents, LBW algorithms can be effective even when interaction is with equally naive peers. The search time complexity is analyzed for pure unbiased Q-learning, LEC, and LB W algorithms for an important class of state spaces. Generally, the results indicate that unbiased Q-learning can have a search time that is exponential in the depth of the state space, while the LEC and LB W algorithms require at most time linear in the state space size and under appropriate conditions, time independent of the state space size and proportional to the length of the optimal solution path. Homogeneous state spaces are useful for studying the scaling properties of reinforcement learning algorithms because they are analytically tractable.
Cite
Text
Whitehead. "Complexity and Cooperation in Q-Learning." International Conference on Machine Learning, 1991. doi:10.1016/B978-1-55860-200-7.50075-1Markdown
[Whitehead. "Complexity and Cooperation in Q-Learning." International Conference on Machine Learning, 1991.](https://mlanthology.org/icml/1991/whitehead1991icml-complexity/) doi:10.1016/B978-1-55860-200-7.50075-1BibTeX
@inproceedings{whitehead1991icml-complexity,
title = {{Complexity and Cooperation in Q-Learning}},
author = {Whitehead, Steven D.},
booktitle = {International Conference on Machine Learning},
year = {1991},
pages = {363-367},
doi = {10.1016/B978-1-55860-200-7.50075-1},
url = {https://mlanthology.org/icml/1991/whitehead1991icml-complexity/}
}