Analyzing the Sensitivity to Policy-Value Decoupling in Deep Reinforcement Learning Generalization
Abstract
Existence of policy-value representation asymmetry negatively affects the generalization capability of traditional actor-critic architectures that use a shared representation of policy and value. Fully decoupled/separated networks for policy and value avoid overfitting by addressing this representation asymmetry. However, using two separate networks introduces increased computational overhead. Recent work has also shown that partial separation can achieve the same level of generalization in most tasks while reducing this computational overhead. Thus, the questions arise: Do we really need two separate networks? Is there any particular scenario where only full separation works? Does increasing the degree of separation in a partially separated network help in generalization? In this work, we attempt to analyze the generalization performance vis-a-vis the extent of decoupling of the policy and value networks. We compare four different degrees of network separation, namely: fully shared, early separation, late separation, and full separation on the RL generalization benchmark Procgen, a suite of 16 procedurally-generated environments. We show that unless the environment has a distinct or explicit source of value estimation, partial late separation can easily capture the necessary policy-value representation asymmetry and achieve better generalization performance in unseen scenarios, however, early separation fails to produce good results.
Cite
Text
Nafi et al. "Analyzing the Sensitivity to Policy-Value Decoupling in Deep Reinforcement Learning Generalization." NeurIPS 2022 Workshops: DeepRL, 2022.Markdown
[Nafi et al. "Analyzing the Sensitivity to Policy-Value Decoupling in Deep Reinforcement Learning Generalization." NeurIPS 2022 Workshops: DeepRL, 2022.](https://mlanthology.org/neuripsw/2022/nafi2022neuripsw-analyzing/)BibTeX
@inproceedings{nafi2022neuripsw-analyzing,
title = {{Analyzing the Sensitivity to Policy-Value Decoupling in Deep Reinforcement Learning Generalization}},
author = {Nafi, Nasik Muhammad and Ali, Raja Farrukh and Hsu, William},
booktitle = {NeurIPS 2022 Workshops: DeepRL},
year = {2022},
url = {https://mlanthology.org/neuripsw/2022/nafi2022neuripsw-analyzing/}
}