Batch Calibration: Rethinking Calibration for In-Context Learning and Prompt Engineering
Abstract
Prompting and in-context learning (ICL) have become efficient learning paradigms for large language models (LLMs). However, LLMs suffer from prompt brittleness and various bias factors in the prompt, including but not limited to the formatting, the choice verbalizers, and the ICL examples. To address this problem that results in unexpected performance degradation, calibration methods have been developed to mitigate the effects of these biases while recovering LLM performance. In this work, we first conduct a systematic analysis of the existing calibration methods, where we both provide a unified view and reveal the failure cases. Inspired by these analyses, we propose Batch Calibration (BC), a simple yet intuitive method that controls the contextual bias from the batched input, unifies various prior approaches, and effectively addresses the aforementioned issues. BC is zero-shot, inference-only, and incurs negligible additional costs. We validate the effectiveness of BC with PaLM 2-(S, M, L) and CLIP models and demonstrate state-of-the-art performance over previous calibration baselines across more than 10 natural language understanding tasks.
Cite
Text
Zhou et al. "Batch Calibration: Rethinking Calibration for In-Context Learning and Prompt Engineering." NeurIPS 2023 Workshops: R0-FoMo, 2023.Markdown
[Zhou et al. "Batch Calibration: Rethinking Calibration for In-Context Learning and Prompt Engineering." NeurIPS 2023 Workshops: R0-FoMo, 2023.](https://mlanthology.org/neuripsw/2023/zhou2023neuripsw-batch/)BibTeX
@inproceedings{zhou2023neuripsw-batch,
title = {{Batch Calibration: Rethinking Calibration for In-Context Learning and Prompt Engineering}},
author = {Zhou, Han and Wan, Xingchen and Proleev, Lev and Mincu, Diana and Chen, Jilin and Heller, Katherine and Roy, Subhrajit},
booktitle = {NeurIPS 2023 Workshops: R0-FoMo},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/zhou2023neuripsw-batch/}
}