DPO-Finetuned Large Multi-Modal Planner with Retrieval-Augmented Generation @ EgoPlan Challenge ICML 2024

Abstract

This paper presents technical details for solving a multi-modal task, EgoPlan-Bench. Our model adopts Direct Preference Optimization (DPO), which is originally developed for a single-modal task, to be utilized in a multi-modal setting. This DPO adaptation improves prediction accuracy by highlighting positive answers over negative choices. Additionally, we apply Retrieval-Augmented Generation (RAG) to further enhance generation performance in Multi-modal Large Language Models (MLLMs). However, in our settings, the RAG method does not result in a performance improvement due to the limited retrieval of similar tasks. Our model utilizing DPO shows performance improvements and achieves 53.98% test accuracy compared to baseline methods of 41.35%. Our code is available at https://github.com/aailabkaist/EgoPlan_Challenge_Team_AAILab.

Cite

Text

Lee et al. "DPO-Finetuned Large Multi-Modal Planner with Retrieval-Augmented Generation @ EgoPlan Challenge ICML 2024." ICML 2024 Workshops: MFM-EAI, 2024.

Markdown

[Lee et al. "DPO-Finetuned Large Multi-Modal Planner with Retrieval-Augmented Generation @ EgoPlan Challenge ICML 2024." ICML 2024 Workshops: MFM-EAI, 2024.](https://mlanthology.org/icmlw/2024/lee2024icmlw-dpofinetuned/)

BibTeX

@inproceedings{lee2024icmlw-dpofinetuned,
  title     = {{DPO-Finetuned Large Multi-Modal Planner with Retrieval-Augmented Generation @ EgoPlan Challenge ICML 2024}},
  author    = {Lee, Kwanghyeon and Kang, Mina and Na, Hyungho and Bae, HeeSun and Na, Byeonghu and Kwon, Doyun and Shin, Seungjae and Kim, Yeongmin and Taewoo, Kim and Yun, Seungmin and Moon, Il-chul},
  booktitle = {ICML 2024 Workshops: MFM-EAI},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/lee2024icmlw-dpofinetuned/}
}