Diffusion Self-Weighted Guidance for Offline Reinforcement Learning

Abstract

Offline reinforcement learning (RL) recovers the optimal policy $\pi$ given historical observations of an agent. In practice, $\pi$ is modeled as a weighted version of the agent's behavior policy $\mu$, using a weight function $w$ working as a critic of the agent's behavior. Although recent approaches to offline RL based on diffusion models (DM) have exhibited promising results, they require training a separate guidance network to compute the required scores, which is challenging due to their dependence on the unknown $w$. In this work, we construct a diffusion model over both the actions and the weights, to explore a more streamlined DM-based approach to offline RL. With the proposed setting, the required scores are directly obtained from the diffusion model without learning additional networks. Our main conceptual contribution is a novel exact guidance method, where guidance comes from the same diffusion model; therefore, our proposal is termed Self-Weighted Guidance (SWG). Through an experimental proof of concept for SWG, we show that the proposed method i) generates samples from the desired distribution on toy examples, ii) performs competitively against state-of-the-art methods on D4RL when using resampling, and iii) exhibits robustness and scalability via ablation studies.

Cite

Text

Tagle et al. "Diffusion Self-Weighted Guidance for Offline Reinforcement Learning." Transactions on Machine Learning Research, 2025.

Markdown

[Tagle et al. "Diffusion Self-Weighted Guidance for Offline Reinforcement Learning." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/tagle2025tmlr-diffusion/)

BibTeX

@article{tagle2025tmlr-diffusion,
  title     = {{Diffusion Self-Weighted Guidance for Offline Reinforcement Learning}},
  author    = {Tagle, Augusto and Ruiz-del-solar, Javier and Tobar, Felipe},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/tagle2025tmlr-diffusion/}
}