An Evolutionary Perspective on AI Alignment (Student Abstract)

Abstract

Attempting to align AI capabilities and value structures by means of value elicitation from humans, such as through Reinforcement Learning from Human Feedback (RLHF), is a computational challenge that raises both psychological and philosophical questions. Adopting an evolutionary perspective on the emergence of value structures in humans and machine learning systems can offer a bridge between qualitative and quantitative aspects of alignment. Here, evolutionary dynamics are applied to a game-theoretic model of RLHF. This allows for formal reasoning about the process and capabilities that result from alignment training, even where quantitative benchmarks cannot be clearly defined. A simple parametrized game model of RLHF, subject to replicator dynamics, shows how the success of the training method is sensitive to bias in human judgments. Under ideal conditions, RHLF training leads to aligned behavior. If the choice pattern of the human judge is biased, the training instead incentivizes misalignment. This application shows that evolutionary analyses can contribute to improving the prospects for safety and support successful cooperation between humans and AI systems in deployment.

Cite

Text

Mattsson. "An Evolutionary Perspective on AI Alignment (Student Abstract)." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I28.35276

Markdown

[Mattsson. "An Evolutionary Perspective on AI Alignment (Student Abstract)." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/mattsson2025aaai-evolutionary/) doi:10.1609/AAAI.V39I28.35276

BibTeX

@inproceedings{mattsson2025aaai-evolutionary,
  title     = {{An Evolutionary Perspective on AI Alignment (Student Abstract)}},
  author    = {Mattsson, Ida},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {29428-29430},
  doi       = {10.1609/AAAI.V39I28.35276},
  url       = {https://mlanthology.org/aaai/2025/mattsson2025aaai-evolutionary/}
}