AttentionSmithy: A Modular Framework for Rapid Transformer Development

Abstract

Transformer architectures have revolutionized a broad spectrum of AI applications by leveraging attention mechanisms for parallelized and long-range sequence processing. Despite their remarkable success, building and customizing transformers remains prohibitively complex for many domain experts who lack deep knowledge of low-level implementations. We introduce AttentionSmithy, a modular software package that lowers the barrier to transformer innovation by decomposing key components---attention modules, feed-forward networks, normalization layers, and positional encodings---into reusable building blocks. By disentangling architectural elements into well-defined interfaces, users can rapidly prototype, adapt, and evaluate transformer variants without extensive coding overhead. Our framework currently supports four distinct positional encoding strategies (sinusoidal, learned, rotary, and ALiBi), offers modular integration of multiple attention methods (including standard attention, Longformer, and Linformer), and integrates seamlessly with neural architecture search (NAS) for automated design exploration. The system is designed to support future extensions with minimal overhead. We validate AttentionSmithy by replicating the original ``Attention Is All You Need'' transformer under resource constraints, demonstrating robust performance on a machine translation task. Leveraging the package’s integrated NAS capability, we identified an optimized model configuration that outperformed our baseline, demonstrating the framework’s effectiveness for automated architecture search and model improvement. We further illustrate AttentionSmithy's adaptability through gene-specific modeling, where a variant of a BERT-style architecture achieves over 95\% accuracy on downstream cell type classification tasks using ranked transcriptomic data. These case studies underscore AttentionSmithy's core advantage: enabling specialized experimentation across diverse application domains---from natural language processing to genomic analysis---by obviating the need for labor-intensive, low-level framework manipulation. We anticipate that AttentionSmithy will serve as a foundation for creative transformer-based solutions, expediting research and development in numerous scientific and industrial fields.

Cite

Text

Cranney and Meyer. "AttentionSmithy: A Modular Framework for Rapid Transformer Development." Transactions on Machine Learning Research, 2025.

Markdown

[Cranney and Meyer. "AttentionSmithy: A Modular Framework for Rapid Transformer Development." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/cranney2025tmlr-attentionsmithy/)

BibTeX

@article{cranney2025tmlr-attentionsmithy,
  title     = {{AttentionSmithy: A Modular Framework for Rapid Transformer Development}},
  author    = {Cranney, Caleb and Meyer, Jesse G},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/cranney2025tmlr-attentionsmithy/}
}