Exploring Parameter Space with Structured Noise for Meta-Reinforcement Learning

Abstract

Efficient exploration is a major challenge in Reinforcement Learning (RL) and has been studied extensively. However, for a new task existing methods explore either by taking actions that maximize task agnostic objectives (such as information gain) or applying a simple dithering strategy (such as noise injection), which might not be effective enough. In this paper, we investigate whether previous learning experiences can be leveraged to guide exploration of current new task. To this end, we propose a novel Exploration with Structured Noise in Parameter Space (ESNPS) approach. ESNPS utilizes meta-learning and directly uses meta-policy parameters, which contain prior knowledge, as structured noises to perturb the base model for effective exploration in new tasks. Experimental results on four groups of tasks: cheetah velocity, cheetah direction, ant velocity and ant direction demonstrate the superiority of ESNPS against a number of competitive baselines.

Cite

Text

Xu et al. "Exploring Parameter Space with Structured Noise for Meta-Reinforcement Learning." International Joint Conference on Artificial Intelligence, 2020. doi:10.24963/IJCAI.2020/436

Markdown

[Xu et al. "Exploring Parameter Space with Structured Noise for Meta-Reinforcement Learning." International Joint Conference on Artificial Intelligence, 2020.](https://mlanthology.org/ijcai/2020/xu2020ijcai-exploring/) doi:10.24963/IJCAI.2020/436

BibTeX

@inproceedings{xu2020ijcai-exploring,
  title     = {{Exploring Parameter Space with Structured Noise for Meta-Reinforcement Learning}},
  author    = {Xu, Hui and Zhang, Chong and Wang, Jiaxing and Ouyang, Deqiang and Zheng, Yu and Shao, Jie},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2020},
  pages     = {3153-3159},
  doi       = {10.24963/IJCAI.2020/436},
  url       = {https://mlanthology.org/ijcai/2020/xu2020ijcai-exploring/}
}