Calibrating Language Models via Augmented Prompt Ensembles

Abstract

Large Language Models (LLMs) have achieved remarkable success, but often exhibit overconfidence and poor calibration, particularly after instruction-finetuning, which limits their reliability and applicability. To address this, we investigate ensembles, a technique known to enhance neural network calibration but underexplored in LLMs, possibly due to the computational cost of training and evaluating multiple LLMs. We introduce Calibration via Augmented Prompt Ensembles (CAPE), a practical approach to LLM ensembles that leverages the inherent prompt sensitivity of LLMs by augmenting prompts, e.g., by template paraphrasing or option permutation. Our method requires no additional training and can be efficiently evaluated in batch mode, yielding significant calibration improvements for instruction-tuned LLMs.

Cite

Text

Jiang et al. "Calibrating Language Models via Augmented Prompt Ensembles." ICML 2023 Workshops: DeployableGenerativeAI, 2023.

Markdown

[Jiang et al. "Calibrating Language Models via Augmented Prompt Ensembles." ICML 2023 Workshops: DeployableGenerativeAI, 2023.](https://mlanthology.org/icmlw/2023/jiang2023icmlw-calibrating/)

BibTeX

@inproceedings{jiang2023icmlw-calibrating,
  title     = {{Calibrating Language Models via Augmented Prompt Ensembles}},
  author    = {Jiang, Mingjian and Ruan, Yangjun and Huang, Sicong and Liao, Saifei and Pitis, Silviu and Grosse, Roger Baker and Ba, Jimmy},
  booktitle = {ICML 2023 Workshops: DeployableGenerativeAI},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/jiang2023icmlw-calibrating/}
}