MOOSS: Mask-Enhanced Temporal Contrastive Learning for Smooth State Evolution in Visual Reinforcement Learning

Abstract

In visual Reinforcement Learning (RL) learning from pixel-based observations poses significant challenges on sample efficiency primarily due to the complexity of extracting informative state representations from high-dimensional data. Previous methods such as contrastive-based approaches have made strides in improving sample efficiency but fall short in modeling the nuanced evolution of states. To address this we introduce MOOSS a novel framework that leverages a temporal contrastive objective with the help of graph-based spatial-temporal masking to explicitly model state evolution in visual RL. Specifically we propose a self-supervised dual-component strategy that integrates (1) a graph construction of pixel-based observations for spatial-temporal masking coupled with (2) a multi-level contrastive learning mechanism that enriches state representations by emphasizing temporal continuity and change of states. MOOSS advances the understanding of state dynamics by disrupting and learning from spatial-temporal correlations which facilitates policy learning. Our comprehensive evaluation on multiple continuous and discrete control benchmarks shows that MOOSS outperforms previous state-of-the-art visual RL methods in terms of sample efficiency demonstrating the effectiveness of our method.

Cite

Text

Sun et al. "MOOSS: Mask-Enhanced Temporal Contrastive Learning for Smooth State Evolution in Visual Reinforcement Learning." Winter Conference on Applications of Computer Vision, 2025.

Markdown

[Sun et al. "MOOSS: Mask-Enhanced Temporal Contrastive Learning for Smooth State Evolution in Visual Reinforcement Learning." Winter Conference on Applications of Computer Vision, 2025.](https://mlanthology.org/wacv/2025/sun2025wacv-mooss/)

BibTeX

@inproceedings{sun2025wacv-mooss,
  title     = {{MOOSS: Mask-Enhanced Temporal Contrastive Learning for Smooth State Evolution in Visual Reinforcement Learning}},
  author    = {Sun, Jiarui and Akcal, M. Ugur and Chowdhary, Girish and Zhang, Wei},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2025},
  pages     = {6719-6729},
  url       = {https://mlanthology.org/wacv/2025/sun2025wacv-mooss/}
}