FORTRESS: Fast, Tuning-Free Retrieval Ensemble for Scalable LLM Safety

Abstract

The rapid adoption of Large Language Models in user-facing applications has magnified security risks, as adversarial prompts continue to circumvent built-in safeguards with increasing sophistication. Current external safety classifiers predominantly rely on supervised fine-tuning—a computationally expensive approach that proves brittle against novel attacks and demands constant retraining cycles. We present FORTRESS, a Fast, Orchestrated Tuning-free Retrieval Ensemble for Scalable Safety that eliminates the need for costly, gradient-based fine-tuning. Our framework unifies semantic retrieval and dynamic perplexity analysis with a single instruction-tuned LLM, creating an efficient pipeline that adapts to emerging threats through simple data ingestion rather than model retraining. FORTRESS employs a novel dynamic ensemble strategy that intelligently weighs complementary signals: semantic similarity for known threat patterns and statistical anomaly detection for zero-day attacks. Extensive evaluation across nine safety benchmarks demonstrates that FORTRESS achieves state-of-the-art performance with an F1 score of 91.6\%, while operating over five times faster than leading fine-tuned classifiers. Its data-centric design enables rapid adaptation to new threats through simple data ingestion—a process we show improves performance without a latency trade-off—offering a practical, scalable, and robust approach to LLM safety.

Cite

Text

Chang and Tsai. "FORTRESS: Fast, Tuning-Free Retrieval Ensemble for Scalable LLM Safety." Transactions on Machine Learning Research, 2025.

Markdown

[Chang and Tsai. "FORTRESS: Fast, Tuning-Free Retrieval Ensemble for Scalable LLM Safety." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/chang2025tmlr-fortress/)

BibTeX

@article{chang2025tmlr-fortress,
  title     = {{FORTRESS: Fast, Tuning-Free Retrieval Ensemble for Scalable LLM Safety}},
  author    = {Chang, Chi-Wei and Tsai, Richard Tzong-Han},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/chang2025tmlr-fortress/}
}