ParCNetV2: Oversized Kernel with Enhanced Attention
Abstract
Transformers have shown great potential in various computer vision tasks. By borrowing design concepts from transformers, many studies revolutionized CNNs and showed remarkable results. This paper falls in this line of studies. Specifically, we propose a new convolutional neural network, ParCNetV2, that extends the research line of ParCNetV1 by bridging the gap between CNN and ViT. It introduces two key designs: 1) Oversized Convolution (OC) with twice the size of the input, and 2) Bifurcate Gate Unit (BGU) to ensure that the model is input adaptive. Fusing OC and BGU in a unified CNN, ParCNetV2 is capable of flexibly extracting global features like ViT, while maintaining lower latency and better accuracy. Extensive experiments demonstrate the superiority of our method over other convolutional neural networks and hybrid models that combine CNNs and transformers. The code is publicly available at https://github.com/XuRuihan/ParCNetV2.
Cite
Text
Xu et al. "ParCNetV2: Oversized Kernel with Enhanced Attention." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.00529Markdown
[Xu et al. "ParCNetV2: Oversized Kernel with Enhanced Attention." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/xu2023iccv-parcnetv2/) doi:10.1109/ICCV51070.2023.00529BibTeX
@inproceedings{xu2023iccv-parcnetv2,
title = {{ParCNetV2: Oversized Kernel with Enhanced Attention}},
author = {Xu, Ruihan and Zhang, Haokui and Hu, Wenze and Zhang, Shiliang and Wang, Xiaoyu},
booktitle = {International Conference on Computer Vision},
year = {2023},
pages = {5752-5762},
doi = {10.1109/ICCV51070.2023.00529},
url = {https://mlanthology.org/iccv/2023/xu2023iccv-parcnetv2/}
}