The Rl Perceptron: Dynamics of Policy Learning in High Dimensions

Abstract

Reinforcement learning (RL) algorithms have proven transformative in a range of domains. To tackle real-world domains, these systems often use neural networks to learn policies directly from pixels or other high-dimensional sensory input. By contrast, much theory of RL has focused on discrete state spaces or worst case analyses, and fundamental questions remain about the dynamics of policy learning in high dimensional settings. Here we propose a simple high-dimensional model of RL and derive its typical dynamics as a set of closed-form ODEs. We show that the model exhibits rich behavior including delayed learning under sparse rewards; a speed-accuracy trade-off depending on reward stringency; and a dependence of learning regime on reward baselines. These results offer a first step toward understanding policy gradient methods in high dimensional settings.

Cite

Text

Patel et al. "The Rl Perceptron: Dynamics of Policy Learning in High Dimensions." ICLR 2023 Workshops: Physics4ML, 2023.

Markdown

[Patel et al. "The Rl Perceptron: Dynamics of Policy Learning in High Dimensions." ICLR 2023 Workshops: Physics4ML, 2023.](https://mlanthology.org/iclrw/2023/patel2023iclrw-rl/)

BibTeX

@inproceedings{patel2023iclrw-rl,
  title     = {{The Rl Perceptron: Dynamics of Policy Learning in High Dimensions}},
  author    = {Patel, Nishil and Lee, Sebastian and Mannelli, Stefano Sarao and Goldt, Sebastian and Saxe, Andrew M},
  booktitle = {ICLR 2023 Workshops: Physics4ML},
  year      = {2023},
  url       = {https://mlanthology.org/iclrw/2023/patel2023iclrw-rl/}
}