Accelerating Quantum Reinforcement Learning with a Quantum Natural Policy Gradient Based Approach

Abstract

We address the problem of quantum reinforcement learning (QRL) under model-free settings with quantum oracle access to the Markov Decision Process (MDP). This paper introduces a Quantum Natural Policy Gradient (QNPG) algorithm, which replaces the random sampling used in classical Natural Policy Gradient (NPG) estimators with a deterministic gradient estimation approach, enabling seamless integration into quantum systems. While this modification introduces a bounded bias in the estimator, the bias decays exponentially with increasing truncation levels. This paper demonstrates that the proposed QNPG algorithm achieves a sample complexity of $\tilde{\mathcal{O}}(\epsilon^{-1.5})$ for queries to the quantum oracle, significantly improving the classical lower bound of $\tilde{\mathcal{O}}(\epsilon^{-2})$ for queries to the MDP.

Cite

Text

Xu and Aggarwal. "Accelerating Quantum Reinforcement Learning with a Quantum Natural Policy Gradient Based Approach." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Xu and Aggarwal. "Accelerating Quantum Reinforcement Learning with a Quantum Natural Policy Gradient Based Approach." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/xu2025icml-accelerating/)

BibTeX

@inproceedings{xu2025icml-accelerating,
  title     = {{Accelerating Quantum Reinforcement Learning with a Quantum Natural Policy Gradient Based Approach}},
  author    = {Xu, Yang and Aggarwal, Vaneet},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {69059-69077},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/xu2025icml-accelerating/}
}