Learning Robust Representations for Visual Reinforcement Learning via Task-Relevant Mask Sampling

Abstract

Humans excel at isolating relevant information from noisy data to predict the behavior of dynamic systems, effectively disregarding non-informative, temporally-correlated noise. In contrast, existing visual reinforcement learning algorithms face challenges in generating noise-free predictions within high-dimensional, noise-saturated environments, especially when trained on world models featuring realistic background noise extracted from natural video streams. We propose Task Relevant Mask Sampling (TRMS), a novel approach for identifying task-specific and reward-relevant masks. TRMS utilizes existing segmentation models as a masking prior, which is subsequently followed by a mask selector that dynamically identifies subset of masks at each timestep, selecting those most probable to contribute to task-specific rewards. To mitigate the high computational cost associated with these masking priors, a lightweight student network is trained in parallel. This network learns to perform masking independently and replaces the Segment Anything Model~(SAM)-based teacher network after a brief initial phase (<10-25% of total training). TRMS enhances the generalization capabilities of Soft Actor-Critic agents under distractions, achieves better performance on the RL-Vigen benchmark, which includes challenging variants of the DeepMind Control Suite, Dexterous Manipulation and Quadruped Locomotion tasks.

Cite

Text

Dave et al. "Learning Robust Representations for Visual Reinforcement Learning via Task-Relevant Mask Sampling." Transactions on Machine Learning Research, 2025.

Markdown

[Dave et al. "Learning Robust Representations for Visual Reinforcement Learning via Task-Relevant Mask Sampling." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/dave2025tmlr-learning/)

BibTeX

@article{dave2025tmlr-learning,
  title     = {{Learning Robust Representations for Visual Reinforcement Learning via Task-Relevant Mask Sampling}},
  author    = {Dave, Vedant and Özdenizci, Ozan and Rueckert, Elmar},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/dave2025tmlr-learning/}
}