Enhancing the Resilience of LLMs Against Grey-Box Extractions

Hanbo Huang, Yihan Li, Bowen Jiang, Bo Jiang, Lin Liu, Zhuotao Liu, Ruoyu Sun, Shiyu Liang

ICMLW 2024

/icmlw/2024/huang2024icmlw-enhancing/

Abstract

Large language models are deployed as either closed-source, providing superior performance with limited customization, or open-source, ensuring full transparency at the risk of asset loss. Grey-box approaches, which privatize parts of the model while exposing others, strike a balance between asset protection and customization but are vulnerable to grey-box extraction attacks that aim to replicate model functionality. In this paper, we explore privatization schemes that ensure the resilience of grey-box models against extraction attacks. First, we theoretically prove that an infinitely deep transformer contains a transition layer where earlier layers offer substantial resilience. We introduce EX-Priv, a simple baseline that identifies a small amount of earlier layers for privatization. We validate the effectiveness of EX-Priv across 3 architectures on 16 benchmarks and observe that privatizing \textit{a single decoder layer} identified by EX-Priv yields comparable resilience to privatizing the entire model with \textit{32 decoder layers} on Llama2-7B. We also provide some insights on the effectiveness.

PDF ICMLW OpenReview Semantic Scholar

Cite

Text

Huang et al. "Enhancing the Resilience of LLMs Against Grey-Box Extractions." ICML 2024 Workshops: NextGenAISafety, 2024.

Markdown

[Huang et al. "Enhancing the Resilience of LLMs Against Grey-Box Extractions." ICML 2024 Workshops: NextGenAISafety, 2024.](https://mlanthology.org/icmlw/2024/huang2024icmlw-enhancing/)

BibTeX

@inproceedings{huang2024icmlw-enhancing,
  title     = {{Enhancing the Resilience of LLMs Against Grey-Box Extractions}},
  author    = {Huang, Hanbo and Li, Yihan and Jiang, Bowen and Jiang, Bo and Liu, Lin and Liu, Zhuotao and Sun, Ruoyu and Liang, Shiyu},
  booktitle = {ICML 2024 Workshops: NextGenAISafety},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/huang2024icmlw-enhancing/}
}