Phonovisual Biases in Language: Is the Lexicon Tied to the Visual World?

Abstract

The present paper addresses the study of cross-linguistic and cross-modal iconicity within a deep learning framework. An LSTM-based Recurrent Neural Network is trained to associate the phonetic representation of a concrete word, encoded as a sequence of feature vectors, to the visual representation of its referent, expressed as an HCNN-transformed image. The processing network is then tested, without further training, in a language that does not appear in the training set and belongs to a different language family. The performance of the model is evaluated through a comparison with a randomized baseline; we show that such an imaginative network is capable of extracting language-independent generalizations in the mapping from linguistic sounds to visual features, providing empirical support for the hypothesis of a universal sound-symbolic substrate underlying all languages.

Cite

Text

de Varda and Strapparava. "Phonovisual Biases in Language: Is the Lexicon Tied to the Visual World?." International Joint Conference on Artificial Intelligence, 2021. doi:10.24963/IJCAI.2021/89

Markdown

[de Varda and Strapparava. "Phonovisual Biases in Language: Is the Lexicon Tied to the Visual World?." International Joint Conference on Artificial Intelligence, 2021.](https://mlanthology.org/ijcai/2021/devarda2021ijcai-phonovisual/) doi:10.24963/IJCAI.2021/89

BibTeX

@inproceedings{devarda2021ijcai-phonovisual,
  title     = {{Phonovisual Biases in Language: Is the Lexicon Tied to the Visual World?}},
  author    = {de Varda, Andrea Gregor and Strapparava, Carlo},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2021},
  pages     = {643-649},
  doi       = {10.24963/IJCAI.2021/89},
  url       = {https://mlanthology.org/ijcai/2021/devarda2021ijcai-phonovisual/}
}