Reproducibility Study: Understanding Multi-Agent LLM Cooperation in the GovSim Framework

Silverio, Alessio; Chezan, Carmen Michaela; van Sprang, Mathijs; Cappendijk, Tom; Smit, Martin

Reproducibility Study: Understanding Multi-Agent LLM Cooperation in the GovSim Framework

Alessio Silverio, Carmen Michaela Chezan, Mathijs van Sprang, Tom Cappendijk, Martin Smit

TMLR 2026

/tmlr/2026/silverio2026tmlr-reproducibility/

Abstract

Governance of the Commons Simulation (GovSim) is a Large Language Model (LLM) multi-agent framework designed to study cooperation and sustainability between LLM agents in resource-sharing environments (Piatti et al., 2024). Understanding the cooperation capabilities of LLMs is vital to the real-world applicability of these models. This study reproduces and extends the original GovSim experiments using recent small-scale open-source LLMs, including newly released instruction-tuned models such as Phi-4 and DeepSeek-R1 distill variants. We evaluate three core claims from the original paper: (1) GovSim enables the study and benchmarking of emergent sustainable behavior, (2) only the largest and most powerful LLM agents achieve a sustainable equilibrium, while smaller models fail, and (3) agents using universalization-based reasoning significantly improve sustainability. Our findings support the first claim, demonstrating that GovSim remains a valid platform for studying social reasoning in multi-agent LLM systems. However, our results challenge the second claim: recent smaller-sized LLMs, particularly DeepSeek-R1-Distill-Qwen-14B, achieve sustainable equilibrium, indicating that advancements in model design and instruction tuning have narrowed the performance gap with larger models. Regarding the third claim, our results confirm that universalization-based reasoning improves performance in the GovSim environment, supporting the third claim of the author. However, further analysis suggests that the improved performance primarily stems from the numerical instructions provided to agents rather than the principle of universalization itself. To further generalize these findings, we extended the framework to include a broader set of social reasoning frameworks. We find that reasoning strategies incorporating explicit numerical guidance consistently outperform abstract ethical prompts, highlighting the critical role of prompt specificity in influencing agent behavior.

PDF TMLR OpenReview Code Semantic Scholar

Cite

Text

Silverio et al. "Reproducibility Study: Understanding Multi-Agent LLM Cooperation in the GovSim Framework." Transactions on Machine Learning Research, 2026.

Markdown

[Silverio et al. "Reproducibility Study: Understanding Multi-Agent LLM Cooperation in the GovSim Framework." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/silverio2026tmlr-reproducibility/)

BibTeX

@article{silverio2026tmlr-reproducibility,
  title     = {{Reproducibility Study: Understanding Multi-Agent LLM Cooperation in the GovSim Framework}},
  author    = {Silverio, Alessio and Chezan, Carmen Michaela and van Sprang, Mathijs and Cappendijk, Tom and Smit, Martin},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/silverio2026tmlr-reproducibility/}
}