RMFlow: Refined Mean Flow by a Noise-Injection Step for Multimodal Generation
Abstract
Mean flow (MeanFlow) enables efficient, high-fidelity image generation, yet its single-function evaluation (1-NFE) generation often cannot yield compelling results. We address this issue by introducing RMFlow, an efficient multimodal generative model that integrates a coarse 1-NFE MeanFlow transport with a subsequent tailored noise-injection refinement step. RMFlow approximates the average velocity of the flow path using a neural network trained with a new loss function that balances minimizing the Wasserstein distance between probability paths and maximizing sample likelihood. RMFlow achieves near state-of-the-art results on text-to-image, context-to-molecule, and time-series generation using only 1-NFE, at a computational cost comparable to the baseline MeanFlows.
Cite
Text
Huang et al. "RMFlow: Refined Mean Flow by a Noise-Injection Step for Multimodal Generation." International Conference on Learning Representations, 2026.Markdown
[Huang et al. "RMFlow: Refined Mean Flow by a Noise-Injection Step for Multimodal Generation." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/huang2026iclr-rmflow/)BibTeX
@inproceedings{huang2026iclr-rmflow,
title = {{RMFlow: Refined Mean Flow by a Noise-Injection Step for Multimodal Generation}},
author = {Huang, Yuhao and Wang, Shih-Hsin and Bertozzi, Andrea L. and Wang, Bao},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/huang2026iclr-rmflow/}
}