Reliability-Adjusted Prioritized Experience Replay

Abstract

Experience replay enables data-efficient learning from past experiences in online reinforcement learning agents. Traditionally, experiences were sampled uniformly from a replay buffer, regardless of differences in experience-specific learning potential. In an effort to sample more efficiently, researchers introduced Prioritized Experience Replay (PER). In this paper, we propose an extension to PER by introducing a novel measure of temporal difference error reliability. We theoretically show that the resulting transition selection algorithm, Reliability-adjusted Prioritized Experience Replay (ReaPER), enables more efficient learning than PER. We further present empirical results showing that ReaPER outperforms both uniform experience replay and PER across a diverse set of traditional environments including several classic control environments and the Atari-10 benchmark, which approximates the median score across the Atari-57 benchmark within one percent of variance.

Cite

Text

Pleiss et al. "Reliability-Adjusted Prioritized Experience Replay." International Conference on Learning Representations, 2026.

Markdown

[Pleiss et al. "Reliability-Adjusted Prioritized Experience Replay." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/pleiss2026iclr-reliabilityadjusted/)

BibTeX

@inproceedings{pleiss2026iclr-reliabilityadjusted,
  title     = {{Reliability-Adjusted Prioritized Experience Replay}},
  author    = {Pleiss, Leonard S. and Sutter, Tobias and Schiffer, Maximilian},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/pleiss2026iclr-reliabilityadjusted/}
}