FMBoost: Boosting Latent Diffusion with Flow Matching
Abstract
Visual synthesis has recently seen significant leaps in performance, largely due to breakthroughs in generative models. Diffusion models have been a key enabler, as they excel in image diversity. However, this comes at the cost of slow training and synthesis, which is only partially alleviated by latent diffusion. To this end, flow matching is an appealing approach due to its complementary characteristics of faster training and inference but less diverse synthesis. We demonstrate our FMBoost approach, which introduces flow matching between a frozen diffusion model and a convolutional decoder that enables high-resolution image synthesis at reduced computational cost and model size. A small diffusion model can then effectively provide the necessary visual diversity, while flow matching efficiently enhances resolution and detail by mapping the small to a high-dimensional latent space, producing high-resolution images. Combining the diversity of diffusion models, the efficiency of flow matching, and the effectiveness of convolutional decoders, state-of-the-art high-resolution image synthesis is achieved at 10242 pixels with minimal computational cost. Cascading FMBoost optionally boosts this further to 20482 pixels. Importantly, this approach is orthogonal to recent approximation and speed-up strategies for the underlying model, making it easily integrable into the various diffusion model frameworks.
Cite
Text
Fischer et al. "FMBoost: Boosting Latent Diffusion with Flow Matching." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73030-6_19Markdown
[Fischer et al. "FMBoost: Boosting Latent Diffusion with Flow Matching." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/fischer2024eccv-fmboost/) doi:10.1007/978-3-031-73030-6_19BibTeX
@inproceedings{fischer2024eccv-fmboost,
title = {{FMBoost: Boosting Latent Diffusion with Flow Matching}},
author = {Fischer, Johannes S and Gui, Ming and Ma, Pingchuan and Stracke, Nick and Baumann, Stefan Andreas and Hu, Vincent Tao and Ommer, Björn},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
doi = {10.1007/978-3-031-73030-6_19},
url = {https://mlanthology.org/eccv/2024/fischer2024eccv-fmboost/}
}