Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
Abstract
Large language models exhibit enhanced zero-shot performance on various tasks when fine-tuned with instruction-following data. Multimodal instruction-following models extend these capabilities by integrating both text and images. However, existing models such as MiniGPT-4 and LLaVA face challenges in maintaining dialogue coherence in scenarios involving multiple images. A primary reason is the lack of a specialized dataset for this critical application. To bridge these gaps, we introduce SparklesDialogue, the first machine-generated dialogue dataset tailored for word-level interleaved multi-image and text interactions. Furthermore, we construct SparklesEval, a GPT-assisted benchmark for quantitatively assessing a model's conversational competence across multiple images and dialogue turns. We then present SparklesChat, a multimodal instruction-following model for open-ended dialogues across multiple images. Our experiments validate the effectiveness of training SparklesChat with SparklesDialogue based on MiniGPT-4 and LLaVA-v1.5, which enhances comprehension across multiple images and dialogue turns, and does not compromise single-image understanding capabilities. Qualitative evaluations further demonstrate SparklesChat's generality in handling real-world applications. All resources related to this study are publicly available at https://github.com/HYPJUDY/Sparkles.
Cite
Text
Huang et al. "Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models." ICLR 2024 Workshops: DPFM, 2024.Markdown
[Huang et al. "Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models." ICLR 2024 Workshops: DPFM, 2024.](https://mlanthology.org/iclrw/2024/huang2024iclrw-sparkles/)BibTeX
@inproceedings{huang2024iclrw-sparkles,
title = {{Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models}},
author = {Huang, Yupan and Meng, Zaiqiao and Liu, Fangyu and Su, Yixuan and Collier, Nigel and Lu, Yutong},
booktitle = {ICLR 2024 Workshops: DPFM},
year = {2024},
url = {https://mlanthology.org/iclrw/2024/huang2024iclrw-sparkles/}
}