RepMLPNet: Hierarchical Vision MLP with Re-Parameterized Locality

Abstract

Compared to convolutional layers, fully-connected (FC) layers are better at modeling the long-range dependencies but worse at capturing the local patterns, hence usually less favored for image recognition. In this paper, we propose a methodology, Locality Injection, to incorporate local priors into an FC layer via merging the trained parameters of a parallel conv kernel into the FC kernel. Locality Injection can be viewed as a novel Structural Re-parameterization method since it equivalently converts the structures via transforming the parameters. Based on that, we propose a multi-layer-perceptron (MLP) block named RepMLP Block, which uses three FC layers to extract features, and a novel architecture named RepMLPNet. The hierarchical design distinguishes RepMLPNet from the other concurrently proposed vision MLPs. As it produces feature maps of different levels, it qualifies as a backbone model for downstream tasks like semantic segmentation. Our results reveal that 1) Locality Injection is a general methodology for MLP models; 2) RepMLPNet has favorable accuracy-efficiency trade-off compared to the other MLPs; 3) RepMLPNet is the first MLP that seamlessly transfer to Cityscapes semantic segmentation. The code and models are available at https://github.com/DingXiaoH/RepMLP.

Cite

Text

Ding et al. "RepMLPNet: Hierarchical Vision MLP with Re-Parameterized Locality." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.00066

Markdown

[Ding et al. "RepMLPNet: Hierarchical Vision MLP with Re-Parameterized Locality." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/ding2022cvpr-repmlpnet/) doi:10.1109/CVPR52688.2022.00066

BibTeX

@inproceedings{ding2022cvpr-repmlpnet,
  title     = {{RepMLPNet: Hierarchical Vision MLP with Re-Parameterized Locality}},
  author    = {Ding, Xiaohan and Chen, Honghao and Zhang, Xiangyu and Han, Jungong and Ding, Guiguang},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2022},
  pages     = {578-587},
  doi       = {10.1109/CVPR52688.2022.00066},
  url       = {https://mlanthology.org/cvpr/2022/ding2022cvpr-repmlpnet/}
}