BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, Rerankers and LLMs

Aarab, Ilias

BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, Rerankers and LLMs

ICLR 2026

/iclr/2026/aarab2026iclr-btzsc/

Abstract

Zero-shot text classification (ZSC) offers the promise of eliminating costly task-specific annotation by matching texts directly to human-readable label descriptions. While early approaches have predominantly relied on cross-encoder models fine-tuned for natural language inference (NLI), recent advances in text-embedding models, rerankers, and instruction-tuned large language models (LLMs) have challenged the dominance of NLI-based architectures. Yet, systematically comparing these diverse approaches remains difficult. Existing evaluations, such as MTEB, often incorporate labeled examples through supervised probes or fine-tuning, leaving genuine zero-shot capabilities underexplored. To address this, we introduce __BTZSC__, a comprehensive benchmark of $22$ public datasets spanning sentiment, topic, intent, and emotion classification, capturing diverse domains, class cardinalities, and document lengths. Leveraging BTZSC, we conduct a systematic comparison across four major model families, NLI cross-encoders, embedding models, rerankers and instruction-tuned LLMs, encompassing $38$ public and custom checkpoints. Our results show that: (i) modern rerankers, exemplified by _Qwen3-Reranker-8B_, set a new state-of-the-art with macro $F_1 = 0.72$; (ii) strong embedding models such as _GTE-large-en-v1.5_ substantially close the accuracy gap while offering the best trade-off between accuracy and latency; (iii) instruction-tuned LLMs at 4-12B parameters achieve competitive performance (macro $F_1$ up to $0.67$), excelling particularly on topic classification but trailing specialized rerankers; (iv) NLI cross-encoders plateau even as backbone size increases; and (v) scaling primarily benefits rerankers and LLMs over embedding models. BTZSC and accompanying evaluation code are publicly released to support fair and reproducible progress in zero-shot text understanding.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Aarab. "BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, Rerankers and LLMs." International Conference on Learning Representations, 2026.

Markdown

[Aarab. "BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, Rerankers and LLMs." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/aarab2026iclr-btzsc/)

BibTeX

@inproceedings{aarab2026iclr-btzsc,
  title     = {{BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, Rerankers and LLMs}},
  author    = {Aarab, Ilias},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/aarab2026iclr-btzsc/}
}