Societal Alignment Frameworks Can Improve LLM Alignment

Stanczak, Karolina; Meade, Nicholas; Bhatia, Mehar; Zhou, Hattie; Böttinger, Konstantin; Barnes, Jeremy; Stanley, Jason; Montgomery, Jessica; Zemel, Richard; Papernot, Nicolas; Chapados, Nicolas; Therien, Denis; Lillicrap, Timothy P; Marasovic, Ana; Delacroix, Sylvie; Hadfield, Gillian K; Reddy, Siva

Societal Alignment Frameworks Can Improve LLM Alignment

Karolina Stanczak, Nicholas Meade, Mehar Bhatia, Hattie Zhou, Konstantin Böttinger, Jeremy Barnes, Jason Stanley, Jessica Montgomery, Richard Zemel, Nicolas Papernot, Nicolas Chapados, Denis Therien, Timothy P Lillicrap, Ana Marasovic, Sylvie Delacroix, Gillian K Hadfield, Siva Reddy

ICLRW 2025

/iclrw/2025/stanczak2025iclrw-societal/

Abstract

Recent progress in large language models (LLMs) has focused on producing responses that meet human expectations and align with shared values - a process coined alignment. However, aligning LLMs remains challenging due to the inherent disconnect between the complexity of human values and the narrow nature of the technological approaches designed to address them. Current alignment methods often lead to misspecified objectives, reflecting the broader issue of incomplete contracts, the impracticality of specifying a contract between a model developer, and the model that accounts for every scenario in LLM alignment. In this paper, we argue that improving LLM alignment requires incorporating insights from societal alignment frameworks, including social, economic, and contractual alignment, and provide concrete solutions drawn from these domains. Given the role of uncertainty in contract formalization within societal alignment frameworks, this paper investigates how it manifests in LLM alignment. We end our discussion by offering an alternative view on LLM alignment, framing the under-specified nature of its objectives as an opportunity rather than perfect their specification. Beyond technical improvements in LLM alignment, we discuss the need for participatory alignment interface designs.

PDF ICLRW OpenReview Semantic Scholar

Cite

Text

Stanczak et al. "Societal Alignment Frameworks Can Improve LLM Alignment." ICLR 2025 Workshops: Bi-Align, 2025.

Markdown

[Stanczak et al. "Societal Alignment Frameworks Can Improve LLM Alignment." ICLR 2025 Workshops: Bi-Align, 2025.](https://mlanthology.org/iclrw/2025/stanczak2025iclrw-societal/)

BibTeX

@inproceedings{stanczak2025iclrw-societal,
  title     = {{Societal Alignment Frameworks Can Improve LLM Alignment}},
  author    = {Stanczak, Karolina and Meade, Nicholas and Bhatia, Mehar and Zhou, Hattie and Böttinger, Konstantin and Barnes, Jeremy and Stanley, Jason and Montgomery, Jessica and Zemel, Richard and Papernot, Nicolas and Chapados, Nicolas and Therien, Denis and Lillicrap, Timothy P and Marasovic, Ana and Delacroix, Sylvie and Hadfield, Gillian K and Reddy, Siva},
  booktitle = {ICLR 2025 Workshops: Bi-Align},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/stanczak2025iclrw-societal/}
}