PriViT: Vision Transformers for Private Inference

Abstract

The Vision Transformer (ViT) architecture has emerged as the backbone of choice for state-of-the-art deep models for computer vision applications. However, ViTs are ill-suited for private inference using secure multi-party computation (MPC) protocols, due to the large number of non-polynomial operations (self-attention, feed-forward rectifiers, layer normalization). We develop PriViT, a gradient-based algorithm to selectively Taylorize nonlinearities in ViTs while maintaining their prediction accuracy. Our algorithm is conceptually very simple, easy to implement, and achieves improved performance over existing MPC-friendly transformer architectures in terms of the latency-accuracy Pareto frontier.

Cite

Text

Dhyani et al. "PriViT: Vision Transformers for Private Inference." Transactions on Machine Learning Research, 2024.

Markdown

[Dhyani et al. "PriViT: Vision Transformers for Private Inference." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/dhyani2024tmlr-privit/)

BibTeX

@article{dhyani2024tmlr-privit,
  title     = {{PriViT: Vision Transformers for Private Inference}},
  author    = {Dhyani, Naren and Mo, Jianqiao Cambridge and Yubeaton, Patrick and Cho, Minsu and Joshi, Ameya and Garg, Siddharth and Reagen, Brandon and Hegde, Chinmay},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
  url       = {https://mlanthology.org/tmlr/2024/dhyani2024tmlr-privit/}
}