Finite-Time Analysis of Whittle Index Based Q-Learning for Restless Multi-Armed Bandits with Neural Network Function Approximation

Xiong, Guojun; Li, Jian

Finite-Time Analysis of Whittle Index Based Q-Learning for Restless Multi-Armed Bandits with Neural Network Function Approximation

NeurIPS 2023

/neurips/2023/xiong2023neurips-finitetime/

Abstract

Whittle index policy is a heuristic to the intractable restless multi-armed bandits (RMAB) problem. Although it is provably asymptotically optimal, finding Whittle indices remains difficult. In this paper, we present Neural-Q-Whittle, a Whittle index based Q-learning algorithm for RMAB with neural network function approximation, which is an example of nonlinear two-timescale stochastic approximation with Q-function values updated on a faster timescale and Whittle indices on a slower timescale. Despite the empirical success of deep Q-learning, the non-asymptotic convergence rate of Neural-Q-Whittle, which couples neural networks with two-timescale Q-learning largely remains unclear. This paper provides a finite-time analysis of Neural-Q-Whittle, where data are generated from a Markov chain, and Q-function is approximated by a ReLU neural network. Our analysis leverages a Lyapunov drift approach to capture the evolution of two coupled parameters, and the nonlinearity in value function approximation further requires us to characterize the approximation error. Combing these provide Neural-Q-Whittle with $\mathcal{O}(1/k^{2/3})$ convergence rate, where $k$ is the number of iterations.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Xiong and Li. "Finite-Time Analysis of Whittle Index Based Q-Learning for Restless Multi-Armed Bandits with Neural Network Function Approximation." Neural Information Processing Systems, 2023.

Markdown

[Xiong and Li. "Finite-Time Analysis of Whittle Index Based Q-Learning for Restless Multi-Armed Bandits with Neural Network Function Approximation." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/xiong2023neurips-finitetime/)

BibTeX

@inproceedings{xiong2023neurips-finitetime,
  title     = {{Finite-Time Analysis of Whittle Index Based Q-Learning for Restless Multi-Armed Bandits with Neural Network Function Approximation}},
  author    = {Xiong, Guojun and Li, Jian},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/xiong2023neurips-finitetime/}
}