A Theory of Unsupervised Translation for Understanding Animal Communication

Abstract

Unsupervised translation refers to the challenging task of translating between two languages without parallel translations, i.e., from two separate monolingual corpora without a Rosetta stone. We propose an information-theoretic framework of unsupervised translation that models the case where the source language is that of highly intelligent animals, such as whales, and the target language is a human language, such as English. In particular, there may be limited quantities of source data, the source and target languages may be quite different in nature, and few assumptions are made on the source language syntax. We apply our theory to a stylized setting of tree-based languages. Our analysis suggests that the amount of source data required for unsupervised translation is not significantly more than that of supervised translation. Our analysis is purely information-theoretic; issues of algorithmic efficiency are left for future work. We are motivated by an ambitious initiative to translate whale communication using modern machine translation techniques. The recordings of whale communication that are being collected have no parallel human-language data.

Cite

Text

Goldwasser et al. "A Theory of Unsupervised Translation for Understanding Animal Communication." NeurIPS 2022 Workshops: InfoCog, 2022.

Markdown

[Goldwasser et al. "A Theory of Unsupervised Translation for Understanding Animal Communication." NeurIPS 2022 Workshops: InfoCog, 2022.](https://mlanthology.org/neuripsw/2022/goldwasser2022neuripsw-theory/)

BibTeX

@inproceedings{goldwasser2022neuripsw-theory,
  title     = {{A Theory of Unsupervised Translation for Understanding Animal Communication}},
  author    = {Goldwasser, Shafi and Gruber, David and Kalai, Adam Tauman and Paradise, Orr},
  booktitle = {NeurIPS 2022 Workshops: InfoCog},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/goldwasser2022neuripsw-theory/}
}