RMFlow: Refined Mean Flow by a Noise-Injection Step for Multimodal Generation

Abstract

Mean flow (MeanFlow) enables efficient, high-fidelity image generation, yet its single-function evaluation (1-NFE) generation often cannot yield compelling results. We address this issue by introducing RMFlow, an efficient multimodal generative model that integrates a coarse 1-NFE MeanFlow transport with a subsequent tailored noise-injection refinement step. RMFlow approximates the average velocity of the flow path using a neural network trained with a new loss function that balances minimizing the Wasserstein distance between probability paths and maximizing sample likelihood. RMFlow achieves near state-of-the-art results on text-to-image, context-to-molecule, and time-series generation using only 1-NFE, at a computational cost comparable to the baseline MeanFlows.

Cite

Text

Huang et al. "RMFlow: Refined Mean Flow by a Noise-Injection Step for Multimodal Generation." International Conference on Learning Representations, 2026.

Markdown

[Huang et al. "RMFlow: Refined Mean Flow by a Noise-Injection Step for Multimodal Generation." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/huang2026iclr-rmflow/)

BibTeX

@inproceedings{huang2026iclr-rmflow,
  title     = {{RMFlow: Refined Mean Flow by a Noise-Injection Step for Multimodal Generation}},
  author    = {Huang, Yuhao and Wang, Shih-Hsin and Bertozzi, Andrea L. and Wang, Bao},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/huang2026iclr-rmflow/}
}