Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time

Abstract

Aligning large language models with humans is challenging due to the inherently multifaceted nature of preference feedback. While existing approaches typically frame this as a multi-objective optimization problem, they often overlook how humans actually make decisions. Research on bounded rationality suggests that human decision making follows satisficing strategies- optimizing primary objectives while ensuring others meet acceptable thresholds. To bridge this gap and operationalize the notion of satisficing alignment, we propose SITAlign: an inference-time framework that addresses the multifaceted nature of alignment by maximizing a primary objective while satisfying threshold-based constraints on secondary criteria. We provide theoretical insights by deriving sub-optimality bounds of our satisficing-based inference alignment approach. We empirically validate SITAlign’s performance through extensive experimentation on multiple benchmarks. For instance, on the PKU-SafeRLHF dataset with the primary objective of maximizing helpfulness while ensuring a threshold on harmlessness, SITAlign outperforms the state-of-the-art multi-objective decoding strategy by a margin of 22.3% in terms of GPT-4 win-tie rate for helpfulness reward while adhering to the threshold on harmlessness.

Cite

Text

Chehade et al. "Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Chehade et al. "Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/chehade2025icml-bounded/)

BibTeX

@inproceedings{chehade2025icml-bounded,
  title     = {{Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time}},
  author    = {Chehade, Mohamad Fares El Hajj and Ghosal, Soumya Suvra and Chakraborty, Souradip and Reddy, Avinash and Manocha, Dinesh and Zhu, Hao and Bedi, Amrit Singh},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {7617-7633},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/chehade2025icml-bounded/}
}