A Closer Look at In-Distribution vs. Out-of-Distribution Accuracy for Open-Set Test-Time Adaptation

Li, Zefeng; Shelhamer, Evan

A Closer Look at In-Distribution vs. Out-of-Distribution Accuracy for Open-Set Test-Time Adaptation

TMLR 2026

/tmlr/2026/li2026tmlr-closer/

Abstract

Open-set test-time adaptation (TTA) updates models on new data in the presence of input shifts and unknown output classes. While recent methods have made progress on improving in-distribution (InD) accuracy for known classes, their ability to accurately detect out-of-distribution (OOD) unknown classes remains underexplored. We benchmark robust and open-set TTA methods (SAR, OSTTA, UniEnt, and SoTTA) on the standard corruption benchmarks of CIFAR-10-C at the small scale and ImageNet-C at the large scale. For CIFAR-10-C, we use OOD data from SVHN and CIFAR-100 in their respective corrupted forms of SVHN-C and CIFAR-100-C. For ImageNet-C, we use OOD data from ImageNet-O and Textures in their respective corrupted forms of ImageNet-O-C and Textures-C. ImageNet-O is nearer to ImageNet, as unknown but related object classes (like ``garlic bread'' vs. ``hot dog'' for food, or ``highway'' vs. ``dam'' for infrastructure), while Textures is farther from ImageNet, as non-object patterns (like ``cracked'' mud, ``porous'' sponge, ``veined'' leaves). We evaluate the accuracy and confidence of TTA methods for InD vs. OOD recognition on CIFAR-10-C and ImageNet-C. We verify the accuracy of each method's own OOD detection technique on CIFAR-10-C. We also evaluate on ImageNet-C and report both accuracy and standard OOD detection metrics. We further examine more realistic settings, in which the proportions and rates of OOD data can vary. To explore the trade-off between InD recognition and OOD rejection, we propose a new baseline that replaces softmax/multi-class output with sigmoid/multi-label output. Our analysis shows for the first time that current open-set TTA methods struggle to balance InD and OOD accuracy and that they only imperfectly filter OOD data for their own adaptation updates.

PDF TMLR OpenReview Semantic Scholar

Cite

Text

Li and Shelhamer. "A Closer Look at In-Distribution vs. Out-of-Distribution Accuracy for Open-Set Test-Time Adaptation." Transactions on Machine Learning Research, 2026.

Markdown

[Li and Shelhamer. "A Closer Look at In-Distribution vs. Out-of-Distribution Accuracy for Open-Set Test-Time Adaptation." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/li2026tmlr-closer/)

BibTeX

@article{li2026tmlr-closer,
  title     = {{A Closer Look at In-Distribution vs. Out-of-Distribution Accuracy for Open-Set Test-Time Adaptation}},
  author    = {Li, Zefeng and Shelhamer, Evan},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/li2026tmlr-closer/}
}