DisPIM: Distilling PreTrained Image Models for Generalizable Visuo-Motor Control
Abstract
We introduce DisPIM, a framework that leverages pretrained image models (PIMs) for visuo-motor control. Applying PIMs to visuo-motor control faces a big difficulty due to the distribution shift between the distribution of visual environmental states and that of the pretraining datasets. Due to such a distribution shift, fine-tuning PIMs specifically for visuo-motor control may hurt the generalizability of PIMs, while adding additional tunable parameters for specific actions apparently lead to high computational costs. DisPIM addresses these challenges using a novel feature distillation approach, which obtains a compact model that not only inherit the generalization capability of PIMs but also acquire task-specific skills for visuo-motor control. This good for both sides is mainly achieved by means of a target Q-ensemble mechanism, which is inspired by double Q-learning. This Q-ensemble mechanism can adaptively adjust the distillation rate, so as to balance the objective of generalization and task-specific ability during training. With this balancing mechanism, DisPIM achieves both task-specific and generalizable control requiring a low computation cost. Across a series of algorithms, task domains, and evaluation metrics in both simulation and real robot, our DisPIM demonstrates significant improvements in generalization and overall performance with low computational overhead.
Cite
Text
Wang and Wu. "DisPIM: Distilling PreTrained Image Models for Generalizable Visuo-Motor Control." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/978Markdown
[Wang and Wu. "DisPIM: Distilling PreTrained Image Models for Generalizable Visuo-Motor Control." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/wang2025ijcai-dispim/) doi:10.24963/IJCAI.2025/978BibTeX
@inproceedings{wang2025ijcai-dispim,
title = {{DisPIM: Distilling PreTrained Image Models for Generalizable Visuo-Motor Control}},
author = {Wang, Haitao and Wu, Hejun},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2025},
pages = {8796-8804},
doi = {10.24963/IJCAI.2025/978},
url = {https://mlanthology.org/ijcai/2025/wang2025ijcai-dispim/}
}