Conjugated Discrete Distributions for Distributional Reinforcement Learning

Abstract

In this work we continue to build upon recent advances in reinforcement learning for finite Markov processes. A common approach among previous existing algorithms, both single-actor and distributed, is to either clip rewards or to apply a transformation method on Q-functions to handle a large variety of magnitudes in real discounted returns. We theoretically show that one of the most successful methods may not yield an optimal policy if we have a non-deterministic process. As a solution, we argue that distributional reinforcement learning lends itself to remedy this situation completely. By the introduction of a conjugated distributional operator we may handle a large class of transformations for real returns with guaranteed theoretical convergence. We propose an approximating single-actor algorithm based on this operator that trains agents directly on unaltered rewards using a proper distributional metric given by the Cramér distance. To evaluate its performance in a stochastic setting we train agents on a suite of 55 Atari 2600 games using sticky-actions and obtain state-of-the-art performance compared to other well-known algorithms in the Dopamine framework.

Cite

Text

Lindenberg et al. "Conjugated Discrete Distributions for Distributional Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2022. doi:10.1609/AAAI.V36I7.20716

Markdown

[Lindenberg et al. "Conjugated Discrete Distributions for Distributional Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2022.](https://mlanthology.org/aaai/2022/lindenberg2022aaai-conjugated/) doi:10.1609/AAAI.V36I7.20716

BibTeX

@inproceedings{lindenberg2022aaai-conjugated,
  title     = {{Conjugated Discrete Distributions for Distributional Reinforcement Learning}},
  author    = {Lindenberg, Björn and Nordqvist, Jonas and Lindahl, Karl-Olof},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2022},
  pages     = {7516-7524},
  doi       = {10.1609/AAAI.V36I7.20716},
  url       = {https://mlanthology.org/aaai/2022/lindenberg2022aaai-conjugated/}
}