High-Efficiency Device-Cloud Collaborative Transformer Model

Abstract

Natural Language Processing (NLP) experts have had significant success with unsupervised language pre-training techniques. However, compared to typical NLP models, modern self-attention models require far more computational and memory resources than conventional NLP models, making pre-training or even fine-tuning them quite costly. It drastically restricts their success and uses in a variety of fields. To improve the efficiency, we propose Device-Cloud Collaborative Transformer for an efficient language model, which is a framework across cloud and device, and is designed to encourage learning of representations that generalize better to many different tasks. Specifically, we design Device-Cloud Collaborative Transformer architecture of large language models that benefits both cloud modeling and device modeling. Experimental results demonstrate the effectiveness of our proposed method.

Cite

Text

Jiang et al. "High-Efficiency Device-Cloud Collaborative Transformer Model." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023. doi:10.1109/CVPRW59228.2023.00214

Markdown

[Jiang et al. "High-Efficiency Device-Cloud Collaborative Transformer Model." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023.](https://mlanthology.org/cvprw/2023/jiang2023cvprw-highefficiency/) doi:10.1109/CVPRW59228.2023.00214

BibTeX

@inproceedings{jiang2023cvprw-highefficiency,
  title     = {{High-Efficiency Device-Cloud Collaborative Transformer Model}},
  author    = {Jiang, Penghao and Xin, Ke and Li, Chunxi and Zhou, Yinsi},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2023},
  pages     = {2204-2210},
  doi       = {10.1109/CVPRW59228.2023.00214},
  url       = {https://mlanthology.org/cvprw/2023/jiang2023cvprw-highefficiency/}
}