Leveraging Separated World Model for Exploration in Visually Distracted Environments

Abstract

Model-based unsupervised reinforcement learning (URL) has gained prominence for reducing environment interactions and learning general skills using intrinsic rewards. However, distractors in observations can severely affect intrinsic reward estimation, leading to a biased exploration process, especially in environments with visual inputs like images or videos. To address this challenge, we propose a bi-level optimization framework named Separation-assisted eXplorer (SeeX). In the inner optimization, SeeX trains a separated world model to extract exogenous and endogenous information, minimizing uncertainty to ensure task relevance. In the outer optimization, it learns a policy on imaginary trajectories generated within the endogenous state space to maximize task-relevant uncertainty. Evaluations on multiple locomotion and manipulation tasks demonstrate SeeX's effectiveness.

Cite

Text

Huang et al. "Leveraging Separated World Model for Exploration in Visually Distracted Environments." Neural Information Processing Systems, 2024. doi:10.52202/079017-2618

Markdown

[Huang et al. "Leveraging Separated World Model for Exploration in Visually Distracted Environments." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/huang2024neurips-leveraging/) doi:10.52202/079017-2618

BibTeX

@inproceedings{huang2024neurips-leveraging,
  title     = {{Leveraging Separated World Model for Exploration in Visually Distracted Environments}},
  author    = {Huang, Kaichen and Wan, Shenghua and Shao, Minghao and Sun, Hai-Hang and Gan, Le and Feng, Shuai and Zhan, De-Chuan},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2618},
  url       = {https://mlanthology.org/neurips/2024/huang2024neurips-leveraging/}
}