Multimodal Understanding of Memes with Fair Explanations
Abstract
Digital Memes have been widely utilized in people’s daily lives over social media platforms. Composed of images and descriptive texts, memes are often distributed with the flair of sarcasm or humor, yet can also spread harmful content or biases from social and cultural factors. Aside from mainstream tasks such as meme generation and classification, generating explanations for memes has become more vital and poses challenges in avoiding propagating already embedded biases. Our work studied whether recent advanced Vision Language Models (VL models) can fairly explain meme contents from different domains/topics, contributing to a unified benchmark for meme explanation. With the dataset, we semi-automatically and manually evaluate the quality of VL model-generated explanations, identifying the major categories of biases in meme explanations.
Cite
Text
Zhong and Baghel. "Multimodal Understanding of Memes with Fair Explanations." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00206Markdown
[Zhong and Baghel. "Multimodal Understanding of Memes with Fair Explanations." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/zhong2024cvprw-multimodal/) doi:10.1109/CVPRW63382.2024.00206BibTeX
@inproceedings{zhong2024cvprw-multimodal,
title = {{Multimodal Understanding of Memes with Fair Explanations}},
author = {Zhong, Yang and Baghel, Bhiman Kumar},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2024},
pages = {2007-2017},
doi = {10.1109/CVPRW63382.2024.00206},
url = {https://mlanthology.org/cvprw/2024/zhong2024cvprw-multimodal/}
}