Bootstrapping a High Quality Multilingual Multimodal Dataset for Bletchley

Abstract

Vision-language models have recently made impressive strides, primarily driven by large-scale training on web data. While pioneering works such as CLIP and ALIGN show significant improvements, these are focused on English data as it is easy to source them from the web. Towards serving non-English-speaking demographics, we consider various methods for generating multilingual data and find that a simple bootstrapping mechanism works surprisingly well. Specifically, just using English image captions data and text-only multilingual translation pairs we train a fairly strong multilingual vision-language model and then leverage it to create a much cleaner version of the multilingual image captions dataset we collected. We demonstrate that this dataset which was used to train Bletchley result in a strong multi-modal and multilingual model which reaches strong performance across several multilingual zero-shot tasks. Specifically, Bletchley achieves state-of-the-art results on multilingual COCO, Multi30k sets, IGLUE WIT and xFlickr&CO datasets.

Cite

Text

Mohammed et al. "Bootstrapping a High Quality Multilingual Multimodal Dataset for Bletchley." Proceedings of The 14th Asian Conference on Machine Learning, 2022.

Markdown

[Mohammed et al. "Bootstrapping a High Quality Multilingual Multimodal Dataset for Bletchley." Proceedings of The 14th Asian Conference on Machine Learning, 2022.](https://mlanthology.org/acml/2022/mohammed2022acml-bootstrapping/)

BibTeX

@inproceedings{mohammed2022acml-bootstrapping,
  title     = {{Bootstrapping a High Quality Multilingual Multimodal Dataset for Bletchley}},
  author    = {Mohammed, Owais Khan and Aggarwal, Kriti and Liu, Qiang and Singhal, Saksham and Bjorck, Johan and Som, Subhojit},
  booktitle = {Proceedings of The 14th Asian Conference on Machine Learning},
  year      = {2022},
  pages     = {738-753},
  volume    = {189},
  url       = {https://mlanthology.org/acml/2022/mohammed2022acml-bootstrapping/}
}