Variance-Aware Decision Making with Linear Function Approximation Under Heavy-Tailed Rewards
Abstract
This paper studies how to achieve variance-aware regrets for online decision-making in the presence of heavy-tailed rewards with only finite variances. For linear stochastic bandits, we address the issue of heavy-tailed rewards by modifying the adaptive Huber regression and proposing AdaOFUL. AdaOFUL achieves a state-of-the-art regret bound of $\widetilde{\mathcal{O}}\big(d\big(\sum_{t=1}^T \nu_{t}^2\big)^{1/2}+d\big)$ as if the rewards were uniformly bounded, where $\nu_{t}^2$ is the conditional variance of the reward at round $t$, $d$ is the feature dimension, {and $T$ is number of online rounds}. Building upon AdaOFUL, we propose VARA for linear MDPs, which achieves a variance-aware regret bound of $\widetilde{\mathcal{O}}(d\sqrt{H\mathcal{G}^*K})$. Here, $H$ is the length of episodes, $K$ is the number of episodes, and $\mathcal{G}^*$ is a smaller instance-dependent quantity that can be bounded by other instance-dependent quantities when additional structural conditions on the MDP are satisfied. Overall, our modified adaptive Huber regression algorithm may serve as a useful building block in the design of algorithms for online problems with heavy-tailed rewards.
Cite
Text
Li and Sun. "Variance-Aware Decision Making with Linear Function Approximation Under Heavy-Tailed Rewards." Transactions on Machine Learning Research, 2024.Markdown
[Li and Sun. "Variance-Aware Decision Making with Linear Function Approximation Under Heavy-Tailed Rewards." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/li2024tmlr-varianceaware/)BibTeX
@article{li2024tmlr-varianceaware,
title = {{Variance-Aware Decision Making with Linear Function Approximation Under Heavy-Tailed Rewards}},
author = {Li, Xiang and Sun, Qiang},
journal = {Transactions on Machine Learning Research},
year = {2024},
url = {https://mlanthology.org/tmlr/2024/li2024tmlr-varianceaware/}
}