Procedural Generation of Meta-Reinforcement Learning Tasks

Abstract

Open-endedness stands to benefit from the ability to generate an infinite variety of diverse, challenging environments. One particularly interesting type of challenge is meta-learning (``learning-to-learn''), a hallmark of intelligent behavior. However, the number of meta-learning environments in the literature is limited. Here we describe a parametrized space for simple meta-reinforcement learning (meta-RL) tasks with arbitrary stimuli. The parametrization allows us to randomly generate an arbitrary number of novel simple meta-learning tasks.The parametrization is expressive enough to include many well-known meta-RL tasks, such as bandit problems, the Harlow task, T-mazes, the Daw two-step task and others. Simple extensions allow it to capture tasks based on two-dimensional topological spaces, such as full mazes or find-the-spot domains. We describe a number of randomly generated meta-RL domains of varying complexity and discuss potential issues arising from random generation.

Cite

Text

Miconi. "Procedural Generation of Meta-Reinforcement Learning Tasks." NeurIPS 2023 Workshops: ALOE, 2023.

Markdown

[Miconi. "Procedural Generation of Meta-Reinforcement Learning Tasks." NeurIPS 2023 Workshops: ALOE, 2023.](https://mlanthology.org/neuripsw/2023/miconi2023neuripsw-procedural/)

BibTeX

@inproceedings{miconi2023neuripsw-procedural,
  title     = {{Procedural Generation of Meta-Reinforcement Learning Tasks}},
  author    = {Miconi, Thomas},
  booktitle = {NeurIPS 2023 Workshops: ALOE},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/miconi2023neuripsw-procedural/}
}