Learning Cross-Modal Embeddings for Cooking Recipes and Food Images
Abstract
In this paper, we introduce Recipe1M, a new large-scale, structured corpus of over 1m cooking recipes and 800k food images. As the largest publicly available collection of recipe data, Recipe1M affords the ability to train high-capacity models on aligned, multi-modal data. Accordingly, we train a neural network to find a joint embedding of recipes and images that yields impressive results on an image-recipe retrieval task. Additionally, we demonstrate that regularization via the addition of a high-level, semantic classification objective improves performance to rival that of humans and enables semantic vector arithmetic. We postulate that these embeddings will provide a basis for further exploration of the Recipe1M dataset and food and cooking in general.
Cite
Text
Salvador et al. "Learning Cross-Modal Embeddings for Cooking Recipes and Food Images." Conference on Computer Vision and Pattern Recognition, 2017. doi:10.1109/CVPR.2017.327Markdown
[Salvador et al. "Learning Cross-Modal Embeddings for Cooking Recipes and Food Images." Conference on Computer Vision and Pattern Recognition, 2017.](https://mlanthology.org/cvpr/2017/salvador2017cvpr-learning/) doi:10.1109/CVPR.2017.327BibTeX
@inproceedings{salvador2017cvpr-learning,
title = {{Learning Cross-Modal Embeddings for Cooking Recipes and Food Images}},
author = {Salvador, Amaia and Hynes, Nicholas and Aytar, Yusuf and Marin, Javier and Ofli, Ferda and Weber, Ingmar and Torralba, Antonio},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2017},
doi = {10.1109/CVPR.2017.327},
url = {https://mlanthology.org/cvpr/2017/salvador2017cvpr-learning/}
}