An Image Is Worth Multiple Words: Discovering Object Level Concepts Using Multi-Concept Prompt Learning
Abstract
Textural Inversion, a prompt learning method, learns a singular text embedding for a new "word" to represent image style and appearance, allowing it to be integrated into natural language sentences to generate novel synthesised images. However, identifying multiple unknown object-level concepts within one scene remains a complex challenge. While recent methods have resorted to cropping or masking individual images to learn multiple concepts, these techniques often require prior knowledge of new concepts and are labour-intensive. To address this challenge, we introduce Multi-Concept Prompt Learning (MCPL), where multiple unknown "words" are simultaneously learned from a single sentence-image pair, without any imagery annotations. To enhance the accuracy of word-concept correlation and refine attention mask boundaries, we propose three regularisation techniques: Attention Masking, Prompts Contrastive Loss, and Bind Adjective. Extensive quantitative comparisons with both real-world categories and biomedical images demonstrate that our method can learn new semantically disentangled concepts. Our approach emphasises learning solely from textual embeddings, using less than 10% of the storage space compared to others. The project page, code, and data are available at https://astrazeneca.github.io/mcpl.github.io.
Cite
Text
Jin et al. "An Image Is Worth Multiple Words: Discovering Object Level Concepts Using Multi-Concept Prompt Learning." International Conference on Machine Learning, 2024.Markdown
[Jin et al. "An Image Is Worth Multiple Words: Discovering Object Level Concepts Using Multi-Concept Prompt Learning." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/jin2024icml-image/)BibTeX
@inproceedings{jin2024icml-image,
title = {{An Image Is Worth Multiple Words: Discovering Object Level Concepts Using Multi-Concept Prompt Learning}},
author = {Jin, Chen and Tanno, Ryutaro and Saseendran, Amrutha and Diethe, Tom and Teare, Philip Alexander},
booktitle = {International Conference on Machine Learning},
year = {2024},
pages = {22210-22243},
volume = {235},
url = {https://mlanthology.org/icml/2024/jin2024icml-image/}
}