DQL: A New Updating Strategy for Reinforcement Learning Based on Q-Learning

Mariano, Carlos Eduardo; Morales, Eduardo F.

doi:10.1007/3-540-44795-4_28

DQL: A New Updating Strategy for Reinforcement Learning Based on Q-Learning

Carlos Eduardo Mariano, Eduardo F. Morales

ECML-PKDD 2001 pp. 324-335

doi:10.1007/3-540-44795-4_28 /ecmlpkdd/2001/mariano2001ecml-dql/

Abstract

In reinforcement learning an autonomous agent learns an optimal policy while interacting with the environment. In particular, in one-step Q-learning, with each action an agent updates its Q values considering immediate rewards. In this paper a new strategy for updating Q values is proposed. The strategy, implemented in an algorithm called DQL, uses a set of agents all searching the same goal in the same space to obtain the same optimal policy. Each agent leaves traces over a copy of the environment (copies of Q-values), while searching for a goal. These copies are used by the agents to decide which actions to take. Once all the agents reach a goal, the original Q-values of the best solution found by all the agents are updated using Watkins’ Q-learning formula. DQL has some similarities with Gambardella’s Ant-Q algorithm [4], however it does not require the definition of a domain dependent heuristic and consequently the tuning of additional parameters. DQL also does not update the original Q-values with zero reward while the agents are searching, as Ant-Q does. It is shown how DQL’s guided exploration of several agents with selected exploitation (updating only the best solution) produces faster convergence times than Q-learning and Ant-Q on several test bed problems under similar conditions.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Mariano and Morales. "DQL: A New Updating Strategy for Reinforcement Learning Based on Q-Learning." European Conference on Machine Learning, 2001. doi:10.1007/3-540-44795-4_28

Markdown

[Mariano and Morales. "DQL: A New Updating Strategy for Reinforcement Learning Based on Q-Learning." European Conference on Machine Learning, 2001.](https://mlanthology.org/ecmlpkdd/2001/mariano2001ecml-dql/) doi:10.1007/3-540-44795-4_28

BibTeX

@inproceedings{mariano2001ecml-dql,
  title     = {{DQL: A New Updating Strategy for Reinforcement Learning Based on Q-Learning}},
  author    = {Mariano, Carlos Eduardo and Morales, Eduardo F.},
  booktitle = {European Conference on Machine Learning},
  year      = {2001},
  pages     = {324-335},
  doi       = {10.1007/3-540-44795-4_28},
  url       = {https://mlanthology.org/ecmlpkdd/2001/mariano2001ecml-dql/}
}