Multi-Horizon Learning in Procedurally-Generated Environments for Off-Policy Reinforcement Learning (Student Abstract)
Abstract
Value estimates at multiple timescales can help create advanced discounting functions and allow agents to form more effective predictive models of their environment. In this work, we investigate learning over multiple horizons concurrently for off-policy reinforcement learning by using an advantage-based action selection method and introducing architectural improvements. Our proposed agent learns over multiple horizons simultaneously, while using either exponential or hyperbolic discounting functions. We implement our approach on Rainbow, a value-based off-policy algorithm, and test on Procgen, a collection of procedurally-generated environments, to demonstrate the effectiveness of this approach, specifically to evaluate the agent's performance in previously unseen scenarios.
Cite
Text
Ali et al. "Multi-Horizon Learning in Procedurally-Generated Environments for Off-Policy Reinforcement Learning (Student Abstract)." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I13.26935Markdown
[Ali et al. "Multi-Horizon Learning in Procedurally-Generated Environments for Off-Policy Reinforcement Learning (Student Abstract)." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/ali2023aaai-multi/) doi:10.1609/AAAI.V37I13.26935BibTeX
@inproceedings{ali2023aaai-multi,
title = {{Multi-Horizon Learning in Procedurally-Generated Environments for Off-Policy Reinforcement Learning (Student Abstract)}},
author = {Ali, Raja Farrukh and Duong, Kevin and Nafi, Nasik Muhammad and Hsu, William H.},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2023},
pages = {16150-16151},
doi = {10.1609/AAAI.V37I13.26935},
url = {https://mlanthology.org/aaai/2023/ali2023aaai-multi/}
}