SAMAT: A Stereotype-Aware Multimodal Transformer for Interpretable Misogynistic Meme Detection
Abstract
This paper introduces SAMAT, a Stereotype-Aware Multimodal Alignment Transformer for detecting and explaining implicit misogyny in memes, where harm arises from subtle visual-textual incongruity and cultural stereotypes. SAMAT integrates three components: a Stereotype Subspace Projection Module (SSPM) that structures representations; a fidelity-based retrieval mechanism aligned with a curated Rationale Bank; and an evidence-conditioned explanation generator. For evaluation, we rely on the MEE corpus with 8,000 explanations, Stereotype Alignment (SAS) and Contextual Faithfulness (CFS) scores. Experiments show that SAMAT achieves a Macro-F1 of 88.1%, surpassing MLLM baselines, while improving retrieval faithfulness (SAS: 0.78) and explanation grounding (CFS: 0.68). Ablations confirm gains stem from structured stereotype projection and evidential retrieval, not scale. SAMAT offers a transparent, culturally grounded framework for accountable content moderation, aligning with Responsible AI objectives.
Cite
Text
Singh et al. "SAMAT: A Stereotype-Aware Multimodal Transformer for Interpretable Misogynistic Meme Detection." Transactions on Machine Learning Research, 2026.Markdown
[Singh et al. "SAMAT: A Stereotype-Aware Multimodal Transformer for Interpretable Misogynistic Meme Detection." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/singh2026tmlr-samat/)BibTeX
@article{singh2026tmlr-samat,
title = {{SAMAT: A Stereotype-Aware Multimodal Transformer for Interpretable Misogynistic Meme Detection}},
author = {Singh, gopendra Vikram and Phukan, Arpan and Kanwar, Kushal and Ekbal, Asif},
journal = {Transactions on Machine Learning Research},
year = {2026},
url = {https://mlanthology.org/tmlr/2026/singh2026tmlr-samat/}
}