Probing the Mid-Level Vision Capabilities of Self-Supervised Learning
Abstract
Mid-level vision capabilities -- such as generic object localization and 3D geometric understanding -- are not only fundamental to human vision but are also crucial for many real-world applications of computer vision. These abilities emerge with minimal supervision during the early stages of human visual development. Despite their significance, current self-supervised learning (SSL) approaches are primarily designed and evaluated for high-level recognition tasks, leaving their mid-level vision capabilities largely unexamined. In this study, we introduce a suite of benchmark protocols to systematically assess mid-level vision capabilities and present a comprehensive, controlled evaluation of 22 prominent SSL models across 8 mid-level vision tasks. Our experiments reveal a weak correlation between mid-level and high-level task performance. We also identify several SSL methods with highly imbalanced performance across mid-level and high-level capabilities, as well as some that excel in both. Additionally, we investigate key factors contributing to mid-level vision performance, such as pretraining objectives and network architectures. Our study provides a holistic and timely view of what SSL models have learned, complementing existing research that primarily focuses on high-level vision tasks. We hope our findings guide future SSL research to benchmark models not only on high-level vision tasks but on mid-level as well.
Cite
Text
Chen et al. "Probing the Mid-Level Vision Capabilities of Self-Supervised Learning." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.02801Markdown
[Chen et al. "Probing the Mid-Level Vision Capabilities of Self-Supervised Learning." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/chen2025cvpr-probing/) doi:10.1109/CVPR52734.2025.02801BibTeX
@inproceedings{chen2025cvpr-probing,
title = {{Probing the Mid-Level Vision Capabilities of Self-Supervised Learning}},
author = {Chen, Xuweiyi and Marks, Markus and Cheng, Zezhou},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2025},
pages = {30095-30105},
doi = {10.1109/CVPR52734.2025.02801},
url = {https://mlanthology.org/cvpr/2025/chen2025cvpr-probing/}
}