Learning Intrinsic Rewards as a Bi-Level Optimization Problem
Abstract
We reinterpret the problem of finding intrinsic rewards in reinforcement learning (RL) as a bilevel optimization problem. Using this interpretation, we can make use of recent advancements in the hyperparameter optimization literature, mainly from Self-Tuning Networks (STN), to learn intrinsic rewards. To facilitate our methods, we introduces a new general conditioning layer: Conditional Layer Normalization (CLN). We evaluate our method on several continuous control benchmarks in the Mujoco physics simulator. On all of these benchmarks, the intrinsic rewards learned on the fly lead to higher final rewards.
Cite
Text
Stadie et al. "Learning Intrinsic Rewards as a Bi-Level Optimization Problem." Uncertainty in Artificial Intelligence, 2020.Markdown
[Stadie et al. "Learning Intrinsic Rewards as a Bi-Level Optimization Problem." Uncertainty in Artificial Intelligence, 2020.](https://mlanthology.org/uai/2020/stadie2020uai-learning/)BibTeX
@inproceedings{stadie2020uai-learning,
title = {{Learning Intrinsic Rewards as a Bi-Level Optimization Problem}},
author = {Stadie, Bradly and Zhang, Lunjun and Ba, Jimmy},
booktitle = {Uncertainty in Artificial Intelligence},
year = {2020},
pages = {111-120},
volume = {124},
url = {https://mlanthology.org/uai/2020/stadie2020uai-learning/}
}