Simple Permutations Can Fool Llama: Permutation Attack and Defense for Large Language Models

Abstract

In-context learning (ICL) enables Large Language Models (LLMs) to undertake challenging tasks through given examples. However, it is prone to instability: different orderings of input examples can significantly influence predictions. Current mitigation strategies, focused on post-processing, fail to enhance the model's inherent robustness. This paper extensively investigates this issue of LLMs and uncovers a natural, permutation-based attack that can nearly achieve 100\% success rates on LLMs, while remaining imperceptible to humans. To address this vulnerability, we propose a distributionally robust optimization (DRO)-based tuning method as a defence, explicitly optimizing the model's performance against worst-case permutations to bolster robustness. Our framework comprises two modules: the Permutation Proposal network (P-Net) and LLM. The P-Net formulates the identification of the most challenging permutation as an optimal transport problem, solved using the Sinkhorn algorithm. Through adversarial training, the P-Net progressively enhances the LLM's robustness against permutation instability. Experiments with a synthetic task and ICL tuning task demonstrate that our methodology effectively mitigates permutation attacks and enhances overall performance.

Cite

Text

Chen et al. "Simple Permutations Can Fool Llama: Permutation Attack and Defense for Large Language Models." ICLR 2024 Workshops: SeT_LLM, 2024.

Markdown

[Chen et al. "Simple Permutations Can Fool Llama: Permutation Attack and Defense for Large Language Models." ICLR 2024 Workshops: SeT_LLM, 2024.](https://mlanthology.org/iclrw/2024/chen2024iclrw-simple/)

BibTeX

@inproceedings{chen2024iclrw-simple,
  title     = {{Simple Permutations Can Fool Llama: Permutation Attack and Defense for Large Language Models}},
  author    = {Chen, Liang and Bian, Yatao and Shen, Li and Wong, Kam-Fai},
  booktitle = {ICLR 2024 Workshops: SeT_LLM},
  year      = {2024},
  url       = {https://mlanthology.org/iclrw/2024/chen2024iclrw-simple/}
}