Policy-Shaped Prediction: Improving World Modeling Through Interpretability

Abstract

Model-based reinforcement learning (MBRL) offers sample-efficient policy optimization but is susceptible to distractions. We address this by developing Policy-Shaped Prediction (PSP), a method that empowers agents to interpret their own policies and shape their world models accordingly. By combining gradient-based interpretability, pretrained segmentation models, and adversarial learning, PSP outperforms existing distractor-reduction approaches. This work represents an interpretability-driven advance towards robust MBRL.

Cite

Text

Hutson et al. "Policy-Shaped Prediction: Improving World Modeling Through Interpretability." NeurIPS 2024 Workshops: InterpretableAI, 2024.

Markdown

[Hutson et al. "Policy-Shaped Prediction: Improving World Modeling Through Interpretability." NeurIPS 2024 Workshops: InterpretableAI, 2024.](https://mlanthology.org/neuripsw/2024/hutson2024neuripsw-policyshaped/)

BibTeX

@inproceedings{hutson2024neuripsw-policyshaped,
  title     = {{Policy-Shaped Prediction: Improving World Modeling Through Interpretability}},
  author    = {Hutson, Miles Richard and Kauvar, Isaac and Haber, Nick},
  booktitle = {NeurIPS 2024 Workshops: InterpretableAI},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/hutson2024neuripsw-policyshaped/}
}