Rank-1 Approximation of Inverse Fisher for Natural Policy Gradients in Deep Reinforcement Learning

Abstract

Natural gradients have been long studied in deep reinforcement learning due to its fast convergence properties and covariant weight updates. However, computing natural gradients requires inversion of Fisher Information Matrix (FIM) at each iteration, which is computationally prohibitive in nature. In this paper, we present an efficient and scalable natural policy optimization technique which leverages a rank-1 approximation to full inverse-FIM. We theoretically show that under certain conditions, rank-1 approximation to inverse-FIM converges faster than policy gradients and under some condition, enjoys the same sample complexity as stochastic policy gradient methods. We benchmark our method on a diverse set of environments and show that our methods achieve superior performance than standard trust-region and actor-critic baselines.

Cite

Text

Huo et al. "Rank-1 Approximation of Inverse Fisher for Natural Policy Gradients in Deep Reinforcement Learning." Transactions on Machine Learning Research, 2026.

Markdown

[Huo et al. "Rank-1 Approximation of Inverse Fisher for Natural Policy Gradients in Deep Reinforcement Learning." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/huo2026tmlr-rank1/)

BibTeX

@article{huo2026tmlr-rank1,
  title     = {{Rank-1 Approximation of Inverse Fisher for Natural Policy Gradients in Deep Reinforcement Learning}},
  author    = {Huo, Yingxiao and Dash, Satya Prakash and Stoican, Radu and Kaski, Samuel and Sun, Mingfei},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/huo2026tmlr-rank1/}
}