NOVA: A Benchmark for Rare Anomaly Localization and Clinical Reasoning in Brain MRI
Abstract
In many real-world applications, deployed models encounter inputs that differ from the data seen during training. Open-world recognition ensures that such systems remain robust as ever-emerging, previously _unknown_ categories appear and must be addressed without retraining. Foundation and vision-language models are pre-trained on large and diverse datasets with the expectation of broad generalization across domains, including medical imaging. However, benchmarking these models on test sets with only a few common outlier types silently collapses the evaluation back to a closed-set problem, masking failures on rare or truly novel conditions encountered in clinical use. We therefore present NOVA, a challenging, real-life _evaluation-only_ benchmark of $\sim$900 brain MRI scans that span 281 rare pathologies and heterogeneous acquisition protocols. Each case includes rich clinical narratives and double-blinded expert bounding-box annotations. Together, these enable joint assessment of anomaly localisation, visual captioning, and diagnostic reasoning. Because NOVA is never used for training, it serves as an _extreme_ stress-test of out-of-distribution generalisation: models must bridge a distribution gap both in sample appearance and in semantic space. Baseline results with leading vision-language models (GPT-4o, Gemini 2.0 Flash, and Qwen2.5-VL-72B) reveal substantial performance drops, with approximately a 65\% gap in localisation compared to natural-image benchmarks and 40\% and 20\% gaps in captioning and reasoning, respectively, compared to resident radiologists. Therefore, NOVA establishes a testbed for advancing models that can detect, localize, and reason about truly unknown anomalies.
Cite
Text
Bercea et al. "NOVA: A Benchmark for Rare Anomaly Localization and Clinical Reasoning in Brain MRI." Advances in Neural Information Processing Systems, 2025.Markdown
[Bercea et al. "NOVA: A Benchmark for Rare Anomaly Localization and Clinical Reasoning in Brain MRI." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/bercea2025neurips-nova/)BibTeX
@inproceedings{bercea2025neurips-nova,
title = {{NOVA: A Benchmark for Rare Anomaly Localization and Clinical Reasoning in Brain MRI}},
author = {Bercea, Cosmin I. and Li, Jun and Raffler, Philipp and Riedel, Evamaria Olga and Schmitzer, Lena and Kurz, Angela and Bitzer, Felix and Roßmüller, Paula and Canisius, Julian and Beyrle, Mirjam L. and Liu, Che and Bai, Wenjia and Kainz, Bernhard and Schnabel, Julia A. and Wiestler, Benedikt},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/bercea2025neurips-nova/}
}