Improving OOD Robustness via Background-Aware Test- Time-AugmentationinBlack-BoxandResourceConstrained Settings
Abstract
Deep learning models for text classification typically achieve strong performance on in-distribution (ID) data but often fail to generalize to out-of-distribution (OOD) inputs. This degradation frequently arises because models rely on spurious background cues (e.g., specific syntax or register) learned during training, which become unreliable when the domain changes. While recent Test-Time Augmentation (TTA) approaches have enabled robustness in black-box settings, they often rely on unconstrained rewriting strategies. For instance, standard In-Context Rewriting (ICR) instructs Large Language Models (LLMs) to modify input details to match ID exemplars, creating a high risk of semantic drift and label flipping, particularly when using smaller, resource-constrained LLMs. In this work, we propose a Background-Aware TTA framework that strictly disentangles style from semantics. Unlike prior methods that encourage broad paraphrasing, we utilize a semantic-constrained alignment strategy that enables small, efficient LLMs to transform specific background attributes, such as tone and sentence structure, to match in-distribution priors while explicitly enforcing the preservation of original meaning. This approach mitigates OOD degradation by neutralizing spurious background shifts, allowing frozen black-box models to process inputs in their native distribution without risking semantic corruption. Empirical evaluations across multiple text classification benchmarks demonstrate that our targeted alignment strategy outperforms unconstrained augmentation baselines. By generating higher-fidelity augmentations, our method achieves superior OOD robustness with reduced computational overhead, establishing a viable path for deploying robust in resource-limited black-box environments. We validate the versatility of BA-TTA using a range of open-weights generators, from Llama-2 based models to the recent Llama-3.1-8B and Qwen-2.5-7B, showing consistent gains across model families.
Cite
Text
Song et al. "Improving OOD Robustness via Background-Aware Test- Time-AugmentationinBlack-BoxandResourceConstrained Settings." Transactions on Machine Learning Research, 2026.Markdown
[Song et al. "Improving OOD Robustness via Background-Aware Test- Time-AugmentationinBlack-BoxandResourceConstrained Settings." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/song2026tmlr-improving/)BibTeX
@article{song2026tmlr-improving,
title = {{Improving OOD Robustness via Background-Aware Test- Time-AugmentationinBlack-BoxandResourceConstrained Settings}},
author = {Song, Ping and Ojo, Adegboyega and Curry, Edward},
journal = {Transactions on Machine Learning Research},
year = {2026},
url = {https://mlanthology.org/tmlr/2026/song2026tmlr-improving/}
}