Risk‑Seeking Reinforcement Learning via Multi‑Timescale EVaR Optimization

Abstract

Tail-aware objectives shape agents' behavior when navigating uncertainty and can depart from risk-neutral scenarios. Risk measures such as Value at Risk (VaR) and Conditional Value at Risk (CVaR) have shown promising results in reinforcement learning. In this paper, we study the incorporation of a relatively new coherent risk measure, Entropic Value at Risk (EVaR), as a high-return, risk-seeking objective that the agent seeks to maximize. We propose a multi-timescale stochastic approximation algorithm to seek the optimal parameterized EVaR policy. Our algorithm enables effective exploration of high-return tails and robust gradient approximation to optimize the EVaR objective. We analyze the asymptotic behavior of our proposed algorithm and rigorously evaluate it across various discrete and continuous benchmark environments. The results highlight that the EVaR policy achieves higher cumulative returns and corroborate that EVaR is indeed a competitive risk-seeking objective for RL.

Cite

Text

Ganguly et al. "Risk‑Seeking Reinforcement Learning via Multi‑Timescale EVaR Optimization." Transactions on Machine Learning Research, 2025.

Markdown

[Ganguly et al. "Risk‑Seeking Reinforcement Learning via Multi‑Timescale EVaR Optimization." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/ganguly2025tmlr-riskseeking/)

BibTeX

@article{ganguly2025tmlr-riskseeking,
  title     = {{Risk‑Seeking Reinforcement Learning via Multi‑Timescale EVaR Optimization}},
  author    = {Ganguly, Deep Kumar and Joseph, Ajin George and Girotra, Sarthak and Sekhar, Sirish},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/ganguly2025tmlr-riskseeking/}
}