Recomposing the Reinforcement Learning Building Blocks with Hypernetworks

Abstract

The Reinforcement Learning (RL) building blocks, i.e. $Q$-functions and policy networks, usually take elements from the cartesian product of two domains as input. In particular, the input of the $Q$-function is both the state and the action, and in multi-task problems (Meta-RL) the policy can take a state and a context. Standard architectures tend to ignore these variables’ underlying interpretations and simply concatenate their features into a single vector. In this work, we argue that this choice may lead to poor gradient estimation in actor-critic algorithms and high variance learning steps in Meta-RL algorithms. To consider the interaction between the input variables, we suggest using a Hypernetwork architecture where a primary network determines the weights of a conditional dynamic network. We show that this approach improves the gradient approximation and reduces the learning step variance, which both accelerates learning and improves the final performance. We demonstrate a consistent improvement across different locomotion tasks and different algorithms both in RL (TD3 and SAC) and in Meta-RL (MAML and PEARL).

Cite

Text

Sarafian et al. "Recomposing the Reinforcement Learning Building Blocks with Hypernetworks." International Conference on Machine Learning, 2021.

Markdown

[Sarafian et al. "Recomposing the Reinforcement Learning Building Blocks with Hypernetworks." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/sarafian2021icml-recomposing/)

BibTeX

@inproceedings{sarafian2021icml-recomposing,
  title     = {{Recomposing the Reinforcement Learning Building Blocks with Hypernetworks}},
  author    = {Sarafian, Elad and Keynan, Shai and Kraus, Sarit},
  booktitle = {International Conference on Machine Learning},
  year      = {2021},
  pages     = {9301-9312},
  volume    = {139},
  url       = {https://mlanthology.org/icml/2021/sarafian2021icml-recomposing/}
}