Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation

Abstract

Although diffusion models have achieved strong results in decision-making tasks, their slow inference speed remains a key limitation. While consistency models offer a potential solution, existing applications to decision-making either struggle with suboptimal demonstrations under behavior cloning or rely on complex concurrent training of multiple networks under the actor-critic framework. In this work, we propose a novel approach to consistency distillation for offline reinforcement learning that directly incorporates reward optimization into the distillation process. Our method achieves single-step sampling while generating higher-reward action trajectories through decoupled training and noise-free reward signals. Empirical evaluations on the Gym MuJoCo, FrankaKitchen, and long horizon planning benchmarks demonstrate that our approach can achieve a $9.7$% improvement over previous state-of-the-art while offering up to $142\times$ speedup over diffusion counterparts in inference time.

Cite

Text

Duan et al. "Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation." International Conference on Learning Representations, 2026.

Markdown

[Duan et al. "Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/duan2026iclr-accelerating/)

BibTeX

@inproceedings{duan2026iclr-accelerating,
  title     = {{Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation}},
  author    = {Duan, Xintong and He, Yutong and Tajwar, Fahim and Salakhutdinov, Ruslan and Kolter, J Zico and Schneider, Jeff},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/duan2026iclr-accelerating/}
}