SAMAT: A Stereotype-Aware Multimodal Transformer for Interpretable Misogynistic Meme Detection

Abstract

This paper introduces SAMAT, a Stereotype-Aware Multimodal Alignment Transformer for detecting and explaining implicit misogyny in memes, where harm arises from subtle visual-textual incongruity and cultural stereotypes. SAMAT integrates three components: a Stereotype Subspace Projection Module (SSPM) that structures representations; a fidelity-based retrieval mechanism aligned with a curated Rationale Bank; and an evidence-conditioned explanation generator. For evaluation, we rely on the MEE corpus with 8,000 explanations, Stereotype Alignment (SAS) and Contextual Faithfulness (CFS) scores. Experiments show that SAMAT achieves a Macro-F1 of 88.1%, surpassing MLLM baselines, while improving retrieval faithfulness (SAS: 0.78) and explanation grounding (CFS: 0.68). Ablations confirm gains stem from structured stereotype projection and evidential retrieval, not scale. SAMAT offers a transparent, culturally grounded framework for accountable content moderation, aligning with Responsible AI objectives.

Cite

Text

Singh et al. "SAMAT: A Stereotype-Aware Multimodal Transformer for Interpretable Misogynistic Meme Detection." Transactions on Machine Learning Research, 2026.

Markdown

[Singh et al. "SAMAT: A Stereotype-Aware Multimodal Transformer for Interpretable Misogynistic Meme Detection." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/singh2026tmlr-samat/)

BibTeX

@article{singh2026tmlr-samat,
  title     = {{SAMAT: A Stereotype-Aware Multimodal Transformer for Interpretable Misogynistic Meme Detection}},
  author    = {Singh, gopendra Vikram and Phukan, Arpan and Kanwar, Kushal and Ekbal, Asif},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/singh2026tmlr-samat/}
}