Sampling Complexity of TD and PPO in RKHS

Zou, Lu; Ren, Wendi; Zhang, Weizhong; Ding, Liang; Li, Shuang

Sampling Complexity of TD and PPO in RKHS

Lu Zou, Wendi Ren, Weizhong Zhang, Liang Ding, Shuang Li

ICLR 2026

/iclr/2026/zou2026iclr-sampling/

Abstract

We revisit Proximal Policy Optimization (PPO) from a function-space perspective. Our analysis decouples policy evaluation and improvement in a reproducing kernel Hilbert space (RKHS): (i) A kernelized temporal-difference (TD) critic performs efficient RKHS-gradient updates using only one-step state–action transition samples. (ii) a KL-regularized, natural-gradient policy step exponentiates the evaluated action-value, recovering a PPO/TRPO-style proximal update in continuous state-action spaces. We provide non-asymptotic, instance-adaptive guarantees whose rates depend on RKHS entropy, unifying tabular, linear, Sobolev, Gaussian, and Neural Tangent Kernel (NTK) regimes, and we derive a sampling rule for the proximal update that ensures the optimal $k^{-1/2}$ convergence rate for stochastic optimization. Empirically, the theory-aligned schedule improves stability and sample efficiency on common control tasks (e.g., CartPole, Acrobot, and HalfCheetah), while our TD-based critic attains favorable throughput versus a GAE baseline. Altogether, our results place PPO on a firmer theoretical footing beyond finite-dimensional assumptions and clarify when RKHS-proximal updates with kernel-TD critics yield global policy improvement with practical efficiency.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Zou et al. "Sampling Complexity of TD and PPO in RKHS." International Conference on Learning Representations, 2026.

Markdown

[Zou et al. "Sampling Complexity of TD and PPO in RKHS." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zou2026iclr-sampling/)

BibTeX

@inproceedings{zou2026iclr-sampling,
  title     = {{Sampling Complexity of TD and PPO in RKHS}},
  author    = {Zou, Lu and Ren, Wendi and Zhang, Weizhong and Ding, Liang and Li, Shuang},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zou2026iclr-sampling/}
}