A Temporal Difference Method for Stochastic Continuous Dynamics

Abstract

For continuous systems modeled by dynamical equations such as ODEs and SDEs, Bellman's principle of optimality takes the form of the Hamilton-Jacobi-Bellman (HJB) equation, which provides the theoretical target of reinforcement learning (RL). Although recent advances in RL successfully leverage this formulation, the existing methods typically assume the underlying dynamics are known a priori because they need explicit access to the drift and diffusion coefficients to update the value function following the HJB equation. We address this inherent limitation of HJB-based RL; we propose a model-free approach still targeting the HJB equation and the corresponding temporal difference method. We prove exponential stability of the induced continuous-time dynamics, and we empirically demonstrate the resulting advantages over transition–kernel–based formulations. The proposed formulation paves the way toward bridging stochastic control and model-free reinforcement learning.

Cite

Text

Settai et al. "A Temporal Difference Method for Stochastic Continuous Dynamics." Advances in Neural Information Processing Systems, 2025.

Markdown

[Settai et al. "A Temporal Difference Method for Stochastic Continuous Dynamics." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/settai2025neurips-temporal/)

BibTeX

@inproceedings{settai2025neurips-temporal,
  title     = {{A Temporal Difference Method for Stochastic Continuous Dynamics}},
  author    = {Settai, Haruki and Takeishi, Naoya and Yairi, Takehisa},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/settai2025neurips-temporal/}
}