Efficient Multi-Horizon Learning for Off-Policy Reinforcement Learning

Abstract

Value estimates at multiple timescales can help create advanced discounting functions and allow agents to form more effective predictive models of their environment. In this work, we investigate learning over multiple horizons concurrently for off-policy deep reinforcement learning using an efficient architecture that combines a deeper network with the crucial components of Rainbow, a popular value-based off-policy algorithm. We use an advantage-based action selection method and our proposed agent learns over multiple horizons simultaneously while using either an exponential or hyperbolic discounting function to estimate the advantage that guides the acting policy. We test our approach on the Procgen benchmark, a collection of procedurally-generated environments, to demonstrate the effectiveness of this approach, and specifically, to evaluate the agent's performance in previously unseen scenarios.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Ali et al. "Efficient Multi-Horizon Learning for Off-Policy Reinforcement Learning." NeurIPS 2022 Workshops: DeepRL, 2022.

Markdown

[Ali et al. "Efficient Multi-Horizon Learning for Off-Policy Reinforcement Learning." NeurIPS 2022 Workshops: DeepRL, 2022.](https://mlanthology.org/neuripsw/2022/ali2022neuripsw-efficient/)

BibTeX

@inproceedings{ali2022neuripsw-efficient,
  title     = {{Efficient Multi-Horizon Learning for Off-Policy Reinforcement Learning}},
  author    = {Ali, Raja Farrukh and Nafi, Nasik Muhammad and Duong, Kevin and Hsu, William},
  booktitle = {NeurIPS 2022 Workshops: DeepRL},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/ali2022neuripsw-efficient/}
}