Unifying Image Processing as Visual Prompting Question Answering

Abstract

Image processing is a fundamental task in computer vision, which aims at enhancing image quality and extracting essential features for subsequent vision applications. Traditionally, task-specific models are developed for individual tasks and designing such models requires distinct expertise. Building upon the success of large language models (LLMs) in natural language processing (NLP), there is a similar trend in computer vision, which focuses on developing large-scale models through pretraining and in-context learning. This paradigm shift reduces the reliance on task-specific models, yielding a powerful unified model to deal with various tasks. However, these advances have predominantly concentrated on high-level vision tasks, with less attention paid to low-level vision tasks. To address this issue, we propose a universal model for general image processing that covers image restoration, image enhancement, image feature extraction tasks, etc. Our proposed framework, named PromptGIP, unifies these diverse image processing tasks within a universal framework. Inspired by NLP question answering (QA) techniques, we employ a visual prompting question answering paradigm. Specifically, we treat the input-output image pair as a structured question-answer sentence, thereby reprogramming the image processing task as a prompting QA problem. PromptGIP can undertake diverse cross-domain tasks using provided visual prompts, eliminating the need for task-specific finetuning. Capable of handling up to 15 different image processing tasks, PromptGIP represents a versatile and adaptive approach to general image processing. While PromptGIP has demonstrated a certain degree of out-of-domain task generalization capability, further research is expected to fully explore its more powerful emergent generalization. Codes will be available at https://github.com/lyh-18/PromptGIP.

Cite

Text

Liu et al. "Unifying Image Processing as Visual Prompting Question Answering." International Conference on Machine Learning, 2024.

Markdown

[Liu et al. "Unifying Image Processing as Visual Prompting Question Answering." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/liu2024icml-unifying/)

BibTeX

@inproceedings{liu2024icml-unifying,
  title     = {{Unifying Image Processing as Visual Prompting Question Answering}},
  author    = {Liu, Yihao and Chen, Xiangyu and Ma, Xianzheng and Wang, Xintao and Zhou, Jiantao and Qiao, Yu and Dong, Chao},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {30873-30891},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/liu2024icml-unifying/}
}