Waypoint Transformer: Reinforcement Learning via Supervised Learning with Intermediate Targets

Abstract

Despite the recent advancements in offline reinforcement learning via supervised learning (RvS) methods and the success of the decision transformer (DT) architecture in various domains, DTs have proven to fall short in challenging benchmarks. The root cause of this underperformance lies in their inability to seamlessly connect segments of suboptimal trajectories, i.e., stitch, leading to poor performance. To overcome these limitations, we present a novel approach to enhance RvS methods by integrating intermediate targets. We introduce the waypoint transformer (WT), using an architecture that builds upon the DT framework and is further conditioned on dynamically-generated waypoints. The results show a significant improvement in the final return compared to existing RvS methods, with performance on par or greater than existing temporal difference learning-based methods. Additionally, the performance and stability is significantly improved in the most challenging environments and data configurations, including AntMaze Large Play/Diverse and Kitchen Mixed/Partial.

Cite

Text

Badrinath et al. "Waypoint Transformer: Reinforcement Learning via Supervised Learning with Intermediate Targets." NeurIPS 2023 Workshops: GCRL, 2023.

Markdown

[Badrinath et al. "Waypoint Transformer: Reinforcement Learning via Supervised Learning with Intermediate Targets." NeurIPS 2023 Workshops: GCRL, 2023.](https://mlanthology.org/neuripsw/2023/badrinath2023neuripsw-waypoint/)

BibTeX

@inproceedings{badrinath2023neuripsw-waypoint,
  title     = {{Waypoint Transformer: Reinforcement Learning via Supervised Learning with Intermediate Targets}},
  author    = {Badrinath, Anirudhan and Nie, Allen and Flet-Berliac, Yannis and Brunskill, Emma},
  booktitle = {NeurIPS 2023 Workshops: GCRL},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/badrinath2023neuripsw-waypoint/}
}