Language Alignment via Nash-Learning and Adaptive Feedback

Abstract

Recent research has shown the potential of Nash Learning via Human Feedback for large language model alignment by incorporating the notion of a preference model in a minimax game setup. We take this idea further by casting the alignment as a mirror descent algorithm against the adaptive feedback of an improved opponent, thereby removing the need for learning a preference model or the existence of an annotated dataset altogether. The resulting algorithm, which we refer to as Language Alignment via Nash-learning and Adaptive feedback (LANA), is capable of self-alignment without the need for a human-annotated preference dataset. We support this statement with various experiments and mathematical discussion.

Cite

Text

Azarafrooz and Faal. "Language Alignment via Nash-Learning and Adaptive Feedback." ICML 2024 Workshops: MFHAIA, 2024.

Markdown

[Azarafrooz and Faal. "Language Alignment via Nash-Learning and Adaptive Feedback." ICML 2024 Workshops: MFHAIA, 2024.](https://mlanthology.org/icmlw/2024/azarafrooz2024icmlw-language/)

BibTeX

@inproceedings{azarafrooz2024icmlw-language,
  title     = {{Language Alignment via Nash-Learning and Adaptive Feedback}},
  author    = {Azarafrooz, Ari and Faal, Farshid},
  booktitle = {ICML 2024 Workshops: MFHAIA},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/azarafrooz2024icmlw-language/}
}