VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-Horizon Manipulation

Abstract

We study reward models for long-horizon manipulation by learning from action-free videos and language instructions, which we term the visual-instruction correlation (VIC) problem. Existing VIC methods face challenges in learning rewards for long-horizon tasks due to their lack of sub-stage awareness, difficulty in modeling task complexities, and inadequate object state estimation. To address these challenges, we introduce VICtoR, a novel hierarchical VIC reward model capable of providing effective reward signals for long-horizon manipulation tasks. Trained solely on primitive motion demonstrations, VICtoR effectively provides precise reward signals for long-horizon tasks by assessing task progress at various stages using a novel stage detector and motion progress evaluator. We conducted extensive experiments in both simulated and real-world datasets. The results suggest that VICtoR outperformed the best existing methods, achieving a 43% improvement in success rates for long-horizon tasks. Our project page can be found at https://cmlab-victor.github.io/cmlab-vicotor.github.io/.

Cite

Text

Hung et al. "VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-Horizon Manipulation." International Conference on Learning Representations, 2025.

Markdown

[Hung et al. "VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-Horizon Manipulation." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/hung2025iclr-victor/)

BibTeX

@inproceedings{hung2025iclr-victor,
  title     = {{VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-Horizon Manipulation}},
  author    = {Hung, Kuo-Han and Lo, Pang-Chi and Yeh, Jia-Fong and Hsu, Han-Yuan and Chen, Yi-Ting and Hsu, Winston H.},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/hung2025iclr-victor/}
}