Annotation-Efficient Honesty Alignment via Confidence Elicitation and Calibration

Ni, Shiyu; Bi, Keping; Guo, Jiafeng; Tang, Minghao; Wu, Jingtong; Han, Zengxin; Cheng, Xueqi

Annotation-Efficient Honesty Alignment via Confidence Elicitation and Calibration

Shiyu Ni, Keping Bi, Jiafeng Guo, Minghao Tang, Jingtong Wu, Zengxin Han, Xueqi Cheng

ICLR 2026

/iclr/2026/ni2026iclr-annotationefficient/

Abstract

Honesty alignment—the ability of large language models (LLMs) to recognize their knowledge boundaries and express calibrated confidence—is essential for trustworthy deployment. Existing methods either rely on training-free confidence estimation (e.g., token probabilities, self-consistency) or training-based calibration with correctness annotations. While effective, the latter demands costly, large-scale labeling. We introduce Elicitation-Then-Calibration (EliCal), a two-stage framework that first elicits internal confidence using inexpensive self-consistency supervision, then calibrates this confidence with a small set of correctness annotations. This design substantially reduces annotation requirements while improving generalization across tasks. To support a large-scale study, we release HonestyBench, a benchmark covering ten free-form QA datasets with 560k training and 70k evaluation instances annotated with correctness and self-consistency signals. Experiments show that EliCal achieves near-optimal alignment with only 1k correctness annotations ($\sim$0.18\% of full supervision) and better alignment performance on unseen MMLU tasks than the calibration-only baseline, offering a scalable solution toward universal honesty alignment in LLMs.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Ni et al. "Annotation-Efficient Honesty Alignment via Confidence Elicitation and Calibration." International Conference on Learning Representations, 2026.

Markdown

[Ni et al. "Annotation-Efficient Honesty Alignment via Confidence Elicitation and Calibration." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/ni2026iclr-annotationefficient/)

BibTeX

@inproceedings{ni2026iclr-annotationefficient,
  title     = {{Annotation-Efficient Honesty Alignment via Confidence Elicitation and Calibration}},
  author    = {Ni, Shiyu and Bi, Keping and Guo, Jiafeng and Tang, Minghao and Wu, Jingtong and Han, Zengxin and Cheng, Xueqi},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/ni2026iclr-annotationefficient/}
}