Deep Reinforcement Learning via Past-Success Directed Exploration

Abstract

The balance between exploration and exploitation has always been a core challenge in reinforcement learning. This paper proposes “past-success exploration strategy combined with Softmax action selection”(PSE-Softmax) as an adaptive control method for taking advantage of the characteristics of the online learning process of the agent to adapt exploration parameters dynamically. The proposed strategy is tested on OpenAI Gym with discrete and continuous control tasks, and the experimental results show that PSE-Softmax strategy delivers better performance than deep reinforcement learning algorithms with basic exploration strategies.

Cite

Text

Liu et al. "Deep Reinforcement Learning via Past-Success Directed Exploration." AAAI Conference on Artificial Intelligence, 2019. doi:10.1609/AAAI.V33I01.33019979

Markdown

[Liu et al. "Deep Reinforcement Learning via Past-Success Directed Exploration." AAAI Conference on Artificial Intelligence, 2019.](https://mlanthology.org/aaai/2019/liu2019aaai-deep-a/) doi:10.1609/AAAI.V33I01.33019979

BibTeX

@inproceedings{liu2019aaai-deep-a,
  title     = {{Deep Reinforcement Learning via Past-Success Directed Exploration}},
  author    = {Liu, Xiaoming and Xu, Zhixiong and Cao, Lei and Chen, Xiliang and Kang, Kai},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2019},
  pages     = {9979-9980},
  doi       = {10.1609/AAAI.V33I01.33019979},
  url       = {https://mlanthology.org/aaai/2019/liu2019aaai-deep-a/}
}