Task-Customized Masked Autoencoder via Mixture of Cluster-Conditional Experts

Abstract

Masked Autoencoder (MAE) is a prevailing self-supervised learning method that achieves promising results in model pre-training. However, when the various downstream tasks have data distributions different from the pre-training data, the semantically irrelevant pre-training information might result in negative transfer, impeding MAE’s scalability. To address this issue, we propose a novel MAE-based pre-training paradigm, Mixture of Cluster-conditional Experts (MoCE), which can be trained once but provides customized pre-training models for diverse downstream tasks. Different from the mixture of experts (MoE), our MoCE trains each expert only with semantically relevant images by using cluster-conditional gates. Thus, each downstream task can be allocated to its customized model pre-trained with data most similar to the downstream data. Experiments on a collection of 11 downstream tasks show that MoCE outperforms the vanilla MAE by 2.45\% on average. It also obtains new state-of-the-art self-supervised learning results on detection and segmentation.

Cite

Text

Liu et al. "Task-Customized Masked Autoencoder via Mixture of Cluster-Conditional Experts." International Conference on Learning Representations, 2023.

Markdown

[Liu et al. "Task-Customized Masked Autoencoder via Mixture of Cluster-Conditional Experts." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/liu2023iclr-taskcustomized/)

BibTeX

@inproceedings{liu2023iclr-taskcustomized,
  title     = {{Task-Customized Masked Autoencoder via Mixture of Cluster-Conditional Experts}},
  author    = {Liu, Zhili and Chen, Kai and Han, Jianhua and Hong, Lanqing and Xu, Hang and Li, Zhenguo and Kwok, James},
  booktitle = {International Conference on Learning Representations},
  year      = {2023},
  url       = {https://mlanthology.org/iclr/2023/liu2023iclr-taskcustomized/}
}