Discrete Diffusion Reward Guidance Methods for Offline Reinforcement Learning

Abstract

As reinforcement learning challenges involve larger amounts of data in different forms, new techniques will be required in order to generate high-quality plans with only a compact representation of the original information. While novel diffusion generative policies have provided a way to model complex action distributions directly in the original, high-dimensional feature space, they suffer from slow inference speed and have not yet been applied with reduced dimension or to discrete tasks. In this work, we propose three diffusion-guidance techniques with a reduced representation of the state provided by quantile discretization: a gradient-based approach, a stochastic beam search approach, and a Q-learning approach. Our findings indicate that the gradient-based and beam search approaches are capable of improving scores on an offline reinforcement learning task by a significant margin.

Cite

Text

Coleman et al. "Discrete Diffusion Reward Guidance Methods for Offline Reinforcement Learning." ICML 2023 Workshops: SODS, 2023.

Markdown

[Coleman et al. "Discrete Diffusion Reward Guidance Methods for Offline Reinforcement Learning." ICML 2023 Workshops: SODS, 2023.](https://mlanthology.org/icmlw/2023/coleman2023icmlw-discrete/)

BibTeX

@inproceedings{coleman2023icmlw-discrete,
  title     = {{Discrete Diffusion Reward Guidance Methods for Offline Reinforcement Learning}},
  author    = {Coleman, Matthew and Russakovsky, Olga and Allen-Blanchette, Christine and Zhu, Ye},
  booktitle = {ICML 2023 Workshops: SODS},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/coleman2023icmlw-discrete/}
}