Avey-B

Abstract

Compact pretrained bidirectional encoders remain the backbone of industrial NLP under tight compute and memory budgets. Their effectiveness stems from self-attention’s ability to deliver high-quality bidirectional contextualization with sequence-level parallelism, as popularized by BERT-style architectures. Recently, **Avey** was introduced as an autoregressive, attention-free alternative that naturally admits an encoder-only adaptation. In this paper, we reformulate Avey for the encoder-only paradigm and propose several innovations to its architecture, including decoupled static and dynamic parameterizations, stability-oriented normalization, and neural compression. Results show that this reformulated architecture compares favorably to four widely used Transformer-based encoders, consistently outperforming them on standard token-classification and information-retrieval benchmarks while scaling more efficiently to long contexts.

Cite

Text

Acharya and Hammoud. "Avey-B." International Conference on Learning Representations, 2026.

Markdown

[Acharya and Hammoud. "Avey-B." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/acharya2026iclr-aveyb/)

BibTeX

@inproceedings{acharya2026iclr-aveyb,
  title     = {{Avey-B}},
  author    = {Acharya, Devang and Hammoud, Mohammad},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/acharya2026iclr-aveyb/}
}