CASA: Bridging the Gap Between Policy Improvement and Policy Evaluation with Conflict Averse Policy Iteration

Changnan Xiao, Haosen Shi, Jiajun Fan, Shihong Deng, Haiyan Yin

NeurIPSW 2022

/neuripsw/2022/xiao2022neuripsw-casa/

Abstract

We study the problem of model-free reinforcement learning, which is often solved following the principle of Generalized Policy Iteration (GPI). While GPI is typically an interplay between policy evaluation and policy improvement, most conventional model-free methods with function approximation assume the independence of GPI steps, despite of the inherent connections between them. In this paper, we present a method that attempts to eliminate the inconsistency between policy evaluation step and policy improvement step, leading to a conflict averse GPI solution with gradient-based functional approximation. Our method is capital to balancing exploitation and exploration between policy-based and value-based methods and is applicable to existed policy-based and value-based methods. We conduct extensive experiments to study theoretical properties of our method and demonstrate the effectiveness of our method on Atari 200M benchmark.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Xiao et al. "CASA: Bridging the Gap Between Policy Improvement and Policy Evaluation with Conflict Averse Policy Iteration." NeurIPS 2022 Workshops: DeepRL, 2022.

Markdown

[Xiao et al. "CASA: Bridging the Gap Between Policy Improvement and Policy Evaluation with Conflict Averse Policy Iteration." NeurIPS 2022 Workshops: DeepRL, 2022.](https://mlanthology.org/neuripsw/2022/xiao2022neuripsw-casa/)

BibTeX

@inproceedings{xiao2022neuripsw-casa,
  title     = {{CASA: Bridging the Gap Between Policy Improvement and Policy Evaluation with Conflict Averse Policy Iteration}},
  author    = {Xiao, Changnan and Shi, Haosen and Fan, Jiajun and Deng, Shihong and Yin, Haiyan},
  booktitle = {NeurIPS 2022 Workshops: DeepRL},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/xiao2022neuripsw-casa/}
}