POA: Pre-Training Once for Models of All Sizes
Abstract
Large-scale self-supervised pre-training has paved the way for one foundation model to handle many different vision tasks. Most pre-training methodologies train a single model of a certain size at one time. Nevertheless, various computation or storage constraints in real-world scenarios require substantial efforts to develop a series of models with different sizes to deploy. Thus, in this study, we propose a novel tri-branch self-supervised training framework, termed as POA (Pre-training Once for All), to tackle this aforementioned issue. Our approach introduces an innovative elastic student branch into a modern self-distillation paradigm. At each pre-training step, we randomly sample a sub-network from the original student to form the elastic student and train all branches in a self-distilling fashion. Once pre-trained, POA allows the extraction of pre-trained models of diverse sizes for downstream tasks. Remarkably, the elastic student facilitates the simultaneous pre-training of multiple models with different sizes, which also acts as an additional ensemble of models of various sizes to enhance representation learning. Extensive experiments, including k-nearest neighbors, linear probing evaluation and assessments on multiple downstream tasks demonstrate the effectiveness and advantages of our POA. It achieves state-of-the-art performance using ViT, Swin Transformer and ResNet backbones, producing around a hundred models with different sizes through a single pre-training session. The code is available at: https://github.com/Qichuzyy/POA.
Cite
Text
Zhang et al. "POA: Pre-Training Once for Models of All Sizes." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72646-0_8Markdown
[Zhang et al. "POA: Pre-Training Once for Models of All Sizes." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/zhang2024eccv-poa/) doi:10.1007/978-3-031-72646-0_8BibTeX
@inproceedings{zhang2024eccv-poa,
title = {{POA: Pre-Training Once for Models of All Sizes}},
author = {Zhang, Yingying and Guo, Xin and Lao, Jiangwei and Yu, Lei and Ru, Lixiang and Wang, Jian and Ye, Guo and He, Huimei and Chen, Jingdong and Yang, Ming},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
doi = {10.1007/978-3-031-72646-0_8},
url = {https://mlanthology.org/eccv/2024/zhang2024eccv-poa/}
}