Composing Ensembles of Pre-Trained Models via Iterative Consensus

Abstract

Large pre-trained models exhibit distinct and complementary capabilities dependent on the data they are trained on. Language models such as GPT-3 are capable of textual reasoning but cannot understand visual information, while vision models such as DALL-E can generate photorealistic photos but fail to understand complex language descriptions. In this work, we propose a unified framework for composing ensembles of different pre-trained models -- combining the strengths of each individual model to solve various multimodal problems in a zero-shot manner. We use pre-trained models as "generators" or "scorers" and compose them via closed-loop iterative consensus optimization. The generator constructs proposals and the scorers iteratively provide feedback to refine the generated result. Such closed-loop communication enables models to correct errors caused by other models, significantly boosting performance on downstream tasks, e.g. improving accuracy on grade school math problems by 7.5%, without requiring any model finetuning. We demonstrate that consensus achieved by an ensemble of scorers outperforms the feedback of a single scorer, by leveraging the strengths of each expert model. Results show that the proposed method can be used as a general purpose framework for a wide range of zero-shot multimodal tasks, such as image generation, video question answering, mathematical reasoning, and robotic manipulation.

Cite

Text

Li et al. "Composing Ensembles of Pre-Trained Models via Iterative Consensus." International Conference on Learning Representations, 2023.

Markdown

[Li et al. "Composing Ensembles of Pre-Trained Models via Iterative Consensus." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/li2023iclr-composing/)

BibTeX

@inproceedings{li2023iclr-composing,
  title     = {{Composing Ensembles of Pre-Trained Models via Iterative Consensus}},
  author    = {Li, Shuang and Du, Yilun and Tenenbaum, Joshua B. and Torralba, Antonio and Mordatch, Igor},
  booktitle = {International Conference on Learning Representations},
  year      = {2023},
  url       = {https://mlanthology.org/iclr/2023/li2023iclr-composing/}
}