Discrete Diffusion Reward Guidance Methods for Offline Reinforcement Learning

Matthew Coleman, Olga Russakovsky, Christine Allen-Blanchette, Ye Zhu

ICMLW 2023

/icmlw/2023/coleman2023icmlw-discrete/

Abstract

As reinforcement learning challenges involve larger amounts of data in different forms, new techniques will be required in order to generate high-quality plans with only a compact representation of the original information. While novel diffusion generative policies have provided a way to model complex action distributions directly in the original, high-dimensional feature space, they suffer from slow inference speed and have not yet been applied with reduced dimension or to discrete tasks. In this work, we propose three diffusion-guidance techniques with a reduced representation of the state provided by quantile discretization: a gradient-based approach, a stochastic beam search approach, and a Q-learning approach. Our findings indicate that the gradient-based and beam search approaches are capable of improving scores on an offline reinforcement learning task by a significant margin.

PDF ICMLW OpenReview Semantic Scholar

Cite

Text

Coleman et al. "Discrete Diffusion Reward Guidance Methods for Offline Reinforcement Learning." ICML 2023 Workshops: SODS, 2023.

Markdown

[Coleman et al. "Discrete Diffusion Reward Guidance Methods for Offline Reinforcement Learning." ICML 2023 Workshops: SODS, 2023.](https://mlanthology.org/icmlw/2023/coleman2023icmlw-discrete/)

BibTeX

@inproceedings{coleman2023icmlw-discrete,
  title     = {{Discrete Diffusion Reward Guidance Methods for Offline Reinforcement Learning}},
  author    = {Coleman, Matthew and Russakovsky, Olga and Allen-Blanchette, Christine and Zhu, Ye},
  booktitle = {ICML 2023 Workshops: SODS},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/coleman2023icmlw-discrete/}
}