High-Efficiency Device-Cloud Collaborative Transformer Model
Abstract
Natural Language Processing (NLP) experts have had significant success with unsupervised language pre-training techniques. However, compared to typical NLP models, modern self-attention models require far more computational and memory resources than conventional NLP models, making pre-training or even fine-tuning them quite costly. It drastically restricts their success and uses in a variety of fields. To improve the efficiency, we propose Device-Cloud Collaborative Transformer for an efficient language model, which is a framework across cloud and device, and is designed to encourage learning of representations that generalize better to many different tasks. Specifically, we design Device-Cloud Collaborative Transformer architecture of large language models that benefits both cloud modeling and device modeling. Experimental results demonstrate the effectiveness of our proposed method.
Cite
Text
Jiang et al. "High-Efficiency Device-Cloud Collaborative Transformer Model." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023. doi:10.1109/CVPRW59228.2023.00214Markdown
[Jiang et al. "High-Efficiency Device-Cloud Collaborative Transformer Model." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023.](https://mlanthology.org/cvprw/2023/jiang2023cvprw-highefficiency/) doi:10.1109/CVPRW59228.2023.00214BibTeX
@inproceedings{jiang2023cvprw-highefficiency,
title = {{High-Efficiency Device-Cloud Collaborative Transformer Model}},
author = {Jiang, Penghao and Xin, Ke and Li, Chunxi and Zhou, Yinsi},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2023},
pages = {2204-2210},
doi = {10.1109/CVPRW59228.2023.00214},
url = {https://mlanthology.org/cvprw/2023/jiang2023cvprw-highefficiency/}
}