Kiki or Bouba? Sound Symbolism in Vision-and-Language Models

Abstract

Although the mapping between sound and meaning in human language is assumed to be largely arbitrary, research in cognitive science has shown that there are non-trivial correlations between particular sounds and meanings across languages and demographic groups, a phenomenon known as sound symbolism. Among the many dimensions of meaning, sound symbolism is particularly salient and well-demonstrated with regards to cross-modal associations between language and the visual domain. In this work, we address the question of whether sound symbolism is reflected in vision-and-language models such as CLIP and Stable Diffusion. Using zero-shot knowledge probing to investigate the inherent knowledge of these models, we find strong evidence that they do show this pattern, paralleling the well-known kiki-bouba effect in psycholinguistics. Our work provides a novel method for demonstrating sound symbolism and understanding its nature using computational tools. Our code will be made publicly available.

Cite

Text

Alper and Averbuch-Elor. "Kiki or Bouba? Sound Symbolism in Vision-and-Language Models." Neural Information Processing Systems, 2023.

Markdown

[Alper and Averbuch-Elor. "Kiki or Bouba? Sound Symbolism in Vision-and-Language Models." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/alper2023neurips-kiki/)

BibTeX

@inproceedings{alper2023neurips-kiki,
  title     = {{Kiki or Bouba? Sound Symbolism in Vision-and-Language Models}},
  author    = {Alper, Morris and Averbuch-Elor, Hadar},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/alper2023neurips-kiki/}
}