Replay Memory as an Empirical MDP: Combining Conservative Estimation with Experience Replay

Zhang, Hongming; Xiao, Chenjun; Wang, Han; Jin, Jun; Xu, Bo; Müller, Martin

Replay Memory as an Empirical MDP: Combining Conservative Estimation with Experience Replay

Hongming Zhang, Chenjun Xiao, Han Wang, Jun Jin, Bo Xu, Martin Müller

ICLR 2023

/iclr/2023/zhang2023iclr-replay/

Abstract

Experience replay, which stores transitions in a replay memory for repeated use, plays an important role of improving sample efficiency in reinforcement learning. Existing techniques such as reweighted sampling, episodic learning and reverse sweep update further process the information in the replay memory to make experience replay more efficient. In this work, we further exploit the information in the replay memory by treating it as an empirical \emph{Replay Memory MDP (RM-MDP)}. By solving it with dynamic programming, we learn a conservative value estimate that \emph{only} considers transitions observed in the replay memory. Both value and policy regularizers based on this conservative estimate are developed and integrated with model-free learning algorithms. We design the metric \textit{memory density} to measure the quality of RM-MDP. Our empirical studies quantitatively find a strong correlation between performance improvement and memory density. Our method combines \emph{Conservative Estimation with Experience Replay (CEER)}, improving sample efficiency by a large margin, especially when the memory density is high. Even when the memory density is low, such a conservative estimate can still help to avoid suicidal actions and thereby improve performance.

PDF ICLR Semantic Scholar

Cite

Text

Zhang et al. "Replay Memory as an Empirical MDP: Combining Conservative Estimation with Experience Replay." International Conference on Learning Representations, 2023.

Markdown

[Zhang et al. "Replay Memory as an Empirical MDP: Combining Conservative Estimation with Experience Replay." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/zhang2023iclr-replay/)

BibTeX

@inproceedings{zhang2023iclr-replay,
  title     = {{Replay Memory as an Empirical MDP: Combining Conservative Estimation with Experience Replay}},
  author    = {Zhang, Hongming and Xiao, Chenjun and Wang, Han and Jin, Jun and Xu, Bo and Müller, Martin},
  booktitle = {International Conference on Learning Representations},
  year      = {2023},
  url       = {https://mlanthology.org/iclr/2023/zhang2023iclr-replay/}
}