Multi-Bellman Operator for Convergence of $q$-Learning with Linear Function Approximation

Abstract

We investigate the convergence of $Q$-learning with linear function approximation and introduce the multi-Bellman operator, an extension of the traditional Bellman operator. By analyzing the properties of this operator, we identify conditions under which the projected multi-Bellman operator becomes a contraction, yielding stronger fixed-point guarantees compared to the original Bellman operator. Building on these insights, we propose the multi-$Q$-learning algorithm, which achieves convergence and approximates the optimal solution with arbitrary precision. This contrasts with traditional $Q$-learning, which lacks such convergence guarantees. Finally, we empirically validate our theoretical results.

Cite

Text

Carvalho et al. "Multi-Bellman Operator for Convergence of $q$-Learning with Linear Function Approximation." Transactions on Machine Learning Research, 2025.

Markdown

[Carvalho et al. "Multi-Bellman Operator for Convergence of $q$-Learning with Linear Function Approximation." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/carvalho2025tmlr-multibellman/)

BibTeX

@article{carvalho2025tmlr-multibellman,
  title     = {{Multi-Bellman Operator for Convergence of $q$-Learning with Linear Function Approximation}},
  author    = {Carvalho, Diogo S. and Santos, Pedro A. and Melo, Francisco S.},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/carvalho2025tmlr-multibellman/}
}