Geometric Constraints for Small Language Models to Understand and Expand Scientific Taxonomies
Abstract
Recent findings reveal that token embeddings of Large Language Models (LLMs) exhibit strong hyperbolicity. This insight motivates leveraging LLMs for scientific taxonomy tasks, where maintaining and expanding hierarchical knowledge structures is critical. Although potential, generally-trained LLMs face challenges in directly handling domain-specific taxonomies, including computational cost and hallucination. Meanwhile, Small Language Models (SLMs) provide a more economical alternative if empowered with proper knowledge transfer. In this work, we introduce SS-Mono (Structure-Semantic Monotonization), a novel pipeline that combines local taxonomy augmentation from LLMs, self-supervised fine-tuning of SLMs with geometric constraints, and LLM calibration. Our approach enables efficient and accurate taxonomy expansion across root, leaf, and intermediate nodes. Extensive experiments on both leaf and non-leaf expansion benchmarks demonstrate that a fine-tuned SLM (e.g., DistilBERT-base-110M) consistently outperforms frozen LLMs (e.g., GPT-4o, Gemma-2-9B) and domain-specific baselines. These findings highlight the promise of lightweight yet effective models for structured knowledge enrichment in scientific domains.
Cite
Text
Fang et al. "Geometric Constraints for Small Language Models to Understand and Expand Scientific Taxonomies." International Conference on Learning Representations, 2026.Markdown
[Fang et al. "Geometric Constraints for Small Language Models to Understand and Expand Scientific Taxonomies." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/fang2026iclr-geometric/)BibTeX
@inproceedings{fang2026iclr-geometric,
title = {{Geometric Constraints for Small Language Models to Understand and Expand Scientific Taxonomies}},
author = {Fang, Liri and Fu, Dongqi and Han, Jiawei and He, Jingrui and Torvik, Vetle I},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/fang2026iclr-geometric/}
}