On the Limitations of LLM-Synthesized Social Media Misinformation Moderation

Abstract

Despite significant advances in Large Language Models (LLMs), their effectiveness in social media misinformation moderation -- specifically in generating high-quality moderation texts with accuracy, coherence, and citation reliability comparable to human efforts like Community Notes (CNs) on X -- remains an open question. In this work, we introduce ModBench, a real-world misinformation moderation benchmark consisting of tweets flagged as misleading alongside their corresponding human-written CNs. We evaluate representative open- and closed-source LLMs on ModBench, prompting them to generate CN-style moderation notes with access to human-written CN demonstrations and relevant web-sourced references utilized by CN creators. Our findings reveal persistent and significant flaws in LLM-generated moderation notes, signaling the continued necessity of incorporating trustworthy human-written information to ensure accurate and reliable misinformation moderation.

Cite

Text

Singh et al. "On the Limitations of LLM-Synthesized Social Media Misinformation Moderation." ICLR 2025 Workshops: ICBINB, 2025.

Markdown

[Singh et al. "On the Limitations of LLM-Synthesized Social Media Misinformation Moderation." ICLR 2025 Workshops: ICBINB, 2025.](https://mlanthology.org/iclrw/2025/singh2025iclrw-limitations/)

BibTeX

@inproceedings{singh2025iclrw-limitations,
  title     = {{On the Limitations of LLM-Synthesized Social Media Misinformation Moderation}},
  author    = {Singh, Sahajpreet and Wu, Jiaying and Churina, Svetlana and Jaidka, Kokil},
  booktitle = {ICLR 2025 Workshops: ICBINB},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/singh2025iclrw-limitations/}
}