Efficient Average Reward Reinforcement Learning Using Constant Shifting Values

Yang, Shangdong; Gao, Yang; An, Bo; Wang, Hao; Chen, Xingguo

doi:10.1609/AAAI.V30I1.10285

Efficient Average Reward Reinforcement Learning Using Constant Shifting Values

Shangdong Yang, Yang Gao, Bo An, Hao Wang, Xingguo Chen

AAAI 2016 pp. 2258-2264

doi:10.1609/AAAI.V30I1.10285 /aaai/2016/yang2016aaai-efficient/

Abstract

There are two classes of average reward reinforcement learning (RL) algorithms: model-based ones that explicitly maintain MDP models and model-free ones that do not learn such models. Though model-free algorithms are known to be more efficient, they often cannot converge to optimal policies due to the perturbation of parameters. In this paper, a novel model-free algorithm is proposed, which makes use of constant shifting values (CSVs) estimated from prior knowledge. To encourage exploration during the learning process, the algorithm constantly subtracts the CSV from the rewards. A terminating condition is proposed to handle the unboundedness of Q-values caused by such substraction. The convergence of the proposed algorithm is proved under very mild assumptions. Furthermore, linear function approximation is investigated to generalize our method to handle large-scale tasks. Extensive experiments on representative MDPs and the popular game Tetris show that the proposed algorithms significantly outperform the state-of-the-art ones.

PDF AAAI Semantic Scholar

Cite

Text

Yang et al. "Efficient Average Reward Reinforcement Learning Using Constant Shifting Values." AAAI Conference on Artificial Intelligence, 2016. doi:10.1609/AAAI.V30I1.10285

Markdown

[Yang et al. "Efficient Average Reward Reinforcement Learning Using Constant Shifting Values." AAAI Conference on Artificial Intelligence, 2016.](https://mlanthology.org/aaai/2016/yang2016aaai-efficient/) doi:10.1609/AAAI.V30I1.10285

BibTeX

@inproceedings{yang2016aaai-efficient,
  title     = {{Efficient Average Reward Reinforcement Learning Using Constant Shifting Values}},
  author    = {Yang, Shangdong and Gao, Yang and An, Bo and Wang, Hao and Chen, Xingguo},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2016},
  pages     = {2258-2264},
  doi       = {10.1609/AAAI.V30I1.10285},
  url       = {https://mlanthology.org/aaai/2016/yang2016aaai-efficient/}
}