Optimizing Sharpe Ratio: Risk-Adjusted Decision-Making in Multi-Armed Bandits

Abstract

Sharpe ratio (SR) is a critical parameter in characterizing financial time series as it jointly considers the reward and the volatility of any stock/portfolio through its mean and standard deviation. Deriving online algorithms for optimizing the SR is particularly challenging since even offline policies experience constant regret with respect to the best expert (Even-Dar et al., 2006). This paper focuses on optimizing the regularized square SR (RSSR) by considering two settings: regret minimization (RM) and best arm identification (BAI). In this regard, we propose a novel multiarmed bandit (MAB) algorithm for RM called UCB-RSSR for RSSR maximization. We derive a path-dependent concentration bound for the estimate of the RSSR. Based on that, we derive the regret guarantees of UCB-RSSR and show that it evolves as Ologn\documentclass[12pt]minimal \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}-69pt \begin{document}${\mathcal {O}}\left( \log {n}\right)$\end{document} for the two-armed bandit case played for a horizon n. We also consider algorithms for the fixed budget setting of the BAI problems, i.e., sequential halving and successive rejects, and propose SHSR and SuRSR algorithms. We derive the upper bound for the error probability of BAI algorithms. We demonstrate that UCB-RSSR outperforms the only other known SR optimizing bandit algorithm, U-UCB (Cassel et al., 2023). We also study the efficacy of proposed BAI algorithms for 6 different setups and discuss the cases where our proposed algorithms are suitable. Our research highlights that our proposed algorithms will find extensive applications in risk-aware portfolio management problems.

Cite

Text

Khurshid et al. "Optimizing Sharpe Ratio: Risk-Adjusted Decision-Making in Multi-Armed Bandits." Machine Learning, 2025. doi:10.1007/S10994-024-06680-2

Markdown

[Khurshid et al. "Optimizing Sharpe Ratio: Risk-Adjusted Decision-Making in Multi-Armed Bandits." Machine Learning, 2025.](https://mlanthology.org/mlj/2025/khurshid2025mlj-optimizing/) doi:10.1007/S10994-024-06680-2

BibTeX

@article{khurshid2025mlj-optimizing,
  title     = {{Optimizing Sharpe Ratio: Risk-Adjusted Decision-Making in Multi-Armed Bandits}},
  author    = {Khurshid, Sabrina and Abdulla, Mohammed Shahid and Ghatak, Gourab},
  journal   = {Machine Learning},
  year      = {2025},
  pages     = {32},
  doi       = {10.1007/S10994-024-06680-2},
  volume    = {114},
  url       = {https://mlanthology.org/mlj/2025/khurshid2025mlj-optimizing/}
}