Pic2Prep: A Multimodal Conversational Agent for Cooking Assistance

Mana, Renjith Prasad Kaippilly; Shyalika, Chathurangi; Venkataramanan, Revathy; Eswaramoorthi, Darssan; Sheth, Amit P.

doi:10.1609/AAAI.V39I28.35359

Pic2Prep: A Multimodal Conversational Agent for Cooking Assistance

Renjith Prasad Kaippilly Mana, Chathurangi Shyalika, Revathy Venkataramanan, Darssan Eswaramoorthi, Amit P. Sheth

AAAI 2025 pp. 29661-29663

doi:10.1609/AAAI.V39I28.35359 /aaai/2025/mana2025aaai-pic/

Abstract

As the demand for healthier, personalized culinary experiences grows, so does the need for advanced food computation models that offer more than basic nutritional insights. However, current food computation models lack the depth to provide actionable insights like ingredient substitution or alternative cooking actions to suit users’ dietary goals. To address this, we introduce and demonstrate Pic2Prep, a multimodal conversational system that generates detailed cooking instructions, actions and ingredient lists from both images and text provided by users. The system is developed using a novel dataset generated through Stable Diffusion, where the input consists of recipe titles and ingredient lists from the Recipe1M dataset to create synthesized food images with variations. This dataset is used to fine-tune the Bootstrapping Language-Image Pre-training (BLIP) model to extract cooking instructions and ingredients from food images. Pic2Prep also employs the CookGen model, a small-scale custom generative model to derive specific cooking actions from cooking instructions. A custom mapper, trained on the Mistral model, links these actions to the corresponding ingredients, creating a comprehensive understanding of the cooking process. The system features an interactive user interface that allows users to input images and ask targeted questions, receiving real-time responses.

PDF AAAI Semantic Scholar

Cite

Text

Mana et al. "Pic2Prep: A Multimodal Conversational Agent for Cooking Assistance." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I28.35359

Markdown

[Mana et al. "Pic2Prep: A Multimodal Conversational Agent for Cooking Assistance." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/mana2025aaai-pic/) doi:10.1609/AAAI.V39I28.35359

BibTeX

@inproceedings{mana2025aaai-pic,
  title     = {{Pic2Prep: A Multimodal Conversational Agent for Cooking Assistance}},
  author    = {Mana, Renjith Prasad Kaippilly and Shyalika, Chathurangi and Venkataramanan, Revathy and Eswaramoorthi, Darssan and Sheth, Amit P.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {29661-29663},
  doi       = {10.1609/AAAI.V39I28.35359},
  url       = {https://mlanthology.org/aaai/2025/mana2025aaai-pic/}
}