Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Abstract
This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose 200+, concrete research questions.
Cite
Text
Anwar et al. "Foundational Challenges in Assuring Alignment and Safety of Large Language Models." Transactions on Machine Learning Research, 2024.Markdown
[Anwar et al. "Foundational Challenges in Assuring Alignment and Safety of Large Language Models." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/anwar2024tmlr-foundational/)BibTeX
@article{anwar2024tmlr-foundational,
title = {{Foundational Challenges in Assuring Alignment and Safety of Large Language Models}},
author = {Anwar, Usman and Saparov, Abulhair and Rando, Javier and Paleka, Daniel and Turpin, Miles and Hase, Peter and Lubana, Ekdeep Singh and Jenner, Erik and Casper, Stephen and Sourbut, Oliver and Edelman, Benjamin L. and Zhang, Zhaowei and Günther, Mario and Korinek, Anton and Hernandez-Orallo, Jose and Hammond, Lewis and Bigelow, Eric J and Pan, Alexander and Langosco, Lauro and Korbak, Tomasz and Zhang, Heidi Chenyu and Zhong, Ruiqi and hEigeartaigh, Sean O and Recchia, Gabriel and Corsi, Giulio and Chan, Alan and Anderljung, Markus and Edwards, Lilian and Petrov, Aleksandar and de Witt, Christian Schroeder and Motwani, Sumeet Ramesh and Bengio, Yoshua and Chen, Danqi and Torr, Philip and Albanie, Samuel and Maharaj, Tegan and Foerster, Jakob Nicolaus and Tramèr, Florian and He, He and Kasirzadeh, Atoosa and Choi, Yejin and Krueger, David},
journal = {Transactions on Machine Learning Research},
year = {2024},
url = {https://mlanthology.org/tmlr/2024/anwar2024tmlr-foundational/}
}