Variance-Aware Decision Making with Linear Function Approximation Under Heavy-Tailed Rewards

Abstract

This paper studies how to achieve variance-aware regrets for online decision-making in the presence of heavy-tailed rewards with only finite variances. For linear stochastic bandits, we address the issue of heavy-tailed rewards by modifying the adaptive Huber regression and proposing AdaOFUL. AdaOFUL achieves a state-of-the-art regret bound of $\widetilde{\mathcal{O}}\big(d\big(\sum_{t=1}^T \nu_{t}^2\big)^{1/2}+d\big)$ as if the rewards were uniformly bounded, where $\nu_{t}^2$ is the conditional variance of the reward at round $t$, $d$ is the feature dimension, {and $T$ is number of online rounds}. Building upon AdaOFUL, we propose VARA for linear MDPs, which achieves a variance-aware regret bound of $\widetilde{\mathcal{O}}(d\sqrt{H\mathcal{G}^*K})$. Here, $H$ is the length of episodes, $K$ is the number of episodes, and $\mathcal{G}^*$ is a smaller instance-dependent quantity that can be bounded by other instance-dependent quantities when additional structural conditions on the MDP are satisfied. Overall, our modified adaptive Huber regression algorithm may serve as a useful building block in the design of algorithms for online problems with heavy-tailed rewards.

Cite

Text

Li and Sun. "Variance-Aware Decision Making with Linear Function Approximation Under Heavy-Tailed Rewards." Transactions on Machine Learning Research, 2024.

Markdown

[Li and Sun. "Variance-Aware Decision Making with Linear Function Approximation Under Heavy-Tailed Rewards." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/li2024tmlr-varianceaware/)

BibTeX

@article{li2024tmlr-varianceaware,
  title     = {{Variance-Aware Decision Making with Linear Function Approximation Under Heavy-Tailed Rewards}},
  author    = {Li, Xiang and Sun, Qiang},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
  url       = {https://mlanthology.org/tmlr/2024/li2024tmlr-varianceaware/}
}