Phonovisual Biases in Language: Is the Lexicon Tied to the Visual World?
Abstract
The present paper addresses the study of cross-linguistic and cross-modal iconicity within a deep learning framework. An LSTM-based Recurrent Neural Network is trained to associate the phonetic representation of a concrete word, encoded as a sequence of feature vectors, to the visual representation of its referent, expressed as an HCNN-transformed image. The processing network is then tested, without further training, in a language that does not appear in the training set and belongs to a different language family. The performance of the model is evaluated through a comparison with a randomized baseline; we show that such an imaginative network is capable of extracting language-independent generalizations in the mapping from linguistic sounds to visual features, providing empirical support for the hypothesis of a universal sound-symbolic substrate underlying all languages.
Cite
Text
de Varda and Strapparava. "Phonovisual Biases in Language: Is the Lexicon Tied to the Visual World?." International Joint Conference on Artificial Intelligence, 2021. doi:10.24963/IJCAI.2021/89Markdown
[de Varda and Strapparava. "Phonovisual Biases in Language: Is the Lexicon Tied to the Visual World?." International Joint Conference on Artificial Intelligence, 2021.](https://mlanthology.org/ijcai/2021/devarda2021ijcai-phonovisual/) doi:10.24963/IJCAI.2021/89BibTeX
@inproceedings{devarda2021ijcai-phonovisual,
title = {{Phonovisual Biases in Language: Is the Lexicon Tied to the Visual World?}},
author = {de Varda, Andrea Gregor and Strapparava, Carlo},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2021},
pages = {643-649},
doi = {10.24963/IJCAI.2021/89},
url = {https://mlanthology.org/ijcai/2021/devarda2021ijcai-phonovisual/}
}