Rethinking Discount Regularization: New Interpretations, Unintended Consequences, and Solutions for Regularization in Reinforcement Learning

Abstract

Discount regularization, using a shorter planning horizon when calculating the optimal policy, is a popular choice to avoid overfitting when faced with sparse or noisy data. It is commonly interpreted as de-emphasizing or ignoring delayed effects. In this paper, we prove two alternative views of discount regularization that expose unintended consequences and motivate novel regularization methods. In model-based RL, planning under a lower discount factor acts like a prior with stronger regularization on state-action pairs with more transition data. This leads to poor performance when the transition matrix is estimated from data sets with uneven amounts of data across state-action pairs. In model-free RL, discount regularization equates to planning using a weighted average Bellman update, where the agent plans as if the values of all state-action pairs are closer than implied by the data. Our equivalence theorems motivate simple methods that generalize discount regularization by setting parameters locally for individual state-action pairs rather than globally. We demonstrate the failures of discount regularization and how we remedy them using our state-action-specific methods across empirical examples with both tabular and continuous state spaces.

Cite

Text

Rathnam et al. "Rethinking Discount Regularization: New Interpretations, Unintended Consequences, and Solutions for Regularization in Reinforcement Learning." Journal of Machine Learning Research, 2024.

Markdown

[Rathnam et al. "Rethinking Discount Regularization: New Interpretations, Unintended Consequences, and Solutions for Regularization in Reinforcement Learning." Journal of Machine Learning Research, 2024.](https://mlanthology.org/jmlr/2024/rathnam2024jmlr-rethinking/)

BibTeX

@article{rathnam2024jmlr-rethinking,
  title     = {{Rethinking Discount Regularization: New Interpretations, Unintended Consequences, and Solutions for Regularization in Reinforcement Learning}},
  author    = {Rathnam, Sarah and Parbhoo, Sonali and Swaroop, Siddharth and Pan, Weiwei and Murphy, Susan A. and Doshi-Velez, Finale},
  journal   = {Journal of Machine Learning Research},
  year      = {2024},
  pages     = {1-48},
  volume    = {25},
  url       = {https://mlanthology.org/jmlr/2024/rathnam2024jmlr-rethinking/}
}