Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation

Duan, Xintong; He, Yutong; Tajwar, Fahim; Salakhutdinov, Ruslan; Kolter, J Zico; Schneider, Jeff

Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation

Xintong Duan, Yutong He, Fahim Tajwar, Ruslan Salakhutdinov, J Zico Kolter, Jeff Schneider

ICLR 2026

/iclr/2026/duan2026iclr-accelerating/

Abstract

Although diffusion models have achieved strong results in decision-making tasks, their slow inference speed remains a key limitation. While consistency models offer a potential solution, existing applications to decision-making either struggle with suboptimal demonstrations under behavior cloning or rely on complex concurrent training of multiple networks under the actor-critic framework. In this work, we propose a novel approach to consistency distillation for offline reinforcement learning that directly incorporates reward optimization into the distillation process. Our method achieves single-step sampling while generating higher-reward action trajectories through decoupled training and noise-free reward signals. Empirical evaluations on the Gym MuJoCo, FrankaKitchen, and long horizon planning benchmarks demonstrate that our approach can achieve a $9.7$% improvement over previous state-of-the-art while offering up to $142\times$ speedup over diffusion counterparts in inference time.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Duan et al. "Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation." International Conference on Learning Representations, 2026.

Markdown

[Duan et al. "Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/duan2026iclr-accelerating/)

BibTeX

@inproceedings{duan2026iclr-accelerating,
  title     = {{Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation}},
  author    = {Duan, Xintong and He, Yutong and Tajwar, Fahim and Salakhutdinov, Ruslan and Kolter, J Zico and Schneider, Jeff},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/duan2026iclr-accelerating/}
}