BOND: Aligning LLMs with Best-of-N Distillation

Abstract

Reinforcement learning from human feedback (RLHF) is a key driver of quality and safety in state-of-the-art large language models. Yet, a surprisingly simple and strong inference-time strategy is Best-of-N sampling that selects the best generation among N candidates. In this paper, we propose Best-of-N Distillation (BOND), a novel RLHF algorithm that seeks to emulate Best-of-N but without its significant computational overhead at inference time. Specifically, BOND is a distribution matching algorithm that forces the distribution of generations from the policy to get closer to the Best-of-N distribution. We use the Jeffreys divergence (a linear combination of forward and backward KL) to balance between mode-covering and mode-seeking behavior, and derive an iterative formulation that utilizes a moving anchor for efficiency. We demonstrate the effectiveness of our approach and several design choices through experiments on abstractive summarization and Gemma models.

Cite

Text

Sessa et al. "BOND: Aligning LLMs with Best-of-N Distillation." International Conference on Learning Representations, 2025.

Markdown

[Sessa et al. "BOND: Aligning LLMs with Best-of-N Distillation." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/sessa2025iclr-bond/)

BibTeX

@inproceedings{sessa2025iclr-bond,
  title     = {{BOND: Aligning LLMs with Best-of-N Distillation}},
  author    = {Sessa, Pier Giuseppe and Dadashi-Tazehozi, Robert and Hussenot, Leonard and Ferret, Johan and Vieillard, Nino and Rame, Alexandre and Shahriari, Bobak and Perrin, Sarah and Friesen, Abram L. and Cideron, Geoffrey and Girgin, Sertan and Stanczyk, Piotr and Michi, Andrea and Sinopalnikov, Danila and Garea, Sabela Ramos and Héliou, Amélie and Severyn, Aliaksei and Hoffman, Matthew and Momchev, Nikola and Bachem, Olivier},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/sessa2025iclr-bond/}
}