Rejecting Hallucinated State Targets During Planning

Abstract

In planning processes of computational decision-making agents, generative or predictive models are often used as "generators" to propose "targets" representing sets of expected or desirable states. Unfortunately, learned models inevitably hallucinate infeasible targets that can cause delusional behaviors and safety concerns. We first investigate the kinds of infeasible targets that generators can hallucinate. Then, we devise a strategy to identify and reject infeasible targets by learning a target feasibility evaluator. To ensure that the evaluator is robust and non-delusional, we adopted a design choice combining off-policy compatible learning rule, distributional architecture, and data augmentation based on hindsight relabeling. Attaching to a planning agent, the designed evaluator learns by observing the agent’s interactions with the environment and the targets produced by its generator, without the need to change the agent or its generator. Our controlled experiments show significant reductions in delusional behaviors and performance improvements for various kinds of existing agents.

Cite

Text

Zhao et al. "Rejecting Hallucinated State Targets During Planning." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Zhao et al. "Rejecting Hallucinated State Targets During Planning." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/zhao2025icml-rejecting/)

BibTeX

@inproceedings{zhao2025icml-rejecting,
  title     = {{Rejecting Hallucinated State Targets During Planning}},
  author    = {Zhao, Harry and Sylvain, Tristan and Laroche, Romain and Precup, Doina and Bengio, Yoshua},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {77677-77702},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/zhao2025icml-rejecting/}
}