Deep Hierarchical Learning with Nested Subspace Networks for Large Language Models
Abstract
Large neural networks are typically trained for a fixed computational budget, creating a rigid trade-off between performance and efficiency that is ill-suited for deployment in resource-constrained or dynamic environments. Existing approaches to this problem present a difficult choice: training a discrete collection of specialist models is computationally prohibitive, while dynamic methods like slimmable networks often lack the flexibility to be applied to large, pre-trained foundation models. In this work, we propose *Nested Subspace Networks (NSNs)*, a novel architectural paradigm that enables a single model to be dynamically and granularly adjusted across a continuous spectrum of compute budgets at inference time. The core of our approach is to re-parameterize linear layers to satisfy a nested subspace property, such that the function computed at a given rank is a strict subspace of the function at any higher rank. We show that this entire hierarchy of models can be optimized jointly via an uncertainty-aware objective that learns to balance the contributions of different ranks based on their intrinsic difficulty. We demonstrate empirically that NSNs can be surgically applied to pre-trained LLMs and unlock a smooth and predictable compute-performance frontier. For example, a single NSN-adapted model can achieve a 50\% reduction in inference FLOPs with only a 5 percentage point loss in accuracy. Our findings establish NSNs as a powerful framework for creating the next generation of adaptive foundation models.
Cite
Text
Rauba and van der Schaar. "Deep Hierarchical Learning with Nested Subspace Networks for Large Language Models." International Conference on Learning Representations, 2026.Markdown
[Rauba and van der Schaar. "Deep Hierarchical Learning with Nested Subspace Networks for Large Language Models." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/rauba2026iclr-deep/)BibTeX
@inproceedings{rauba2026iclr-deep,
title = {{Deep Hierarchical Learning with Nested Subspace Networks for Large Language Models}},
author = {Rauba, Paulius and van der Schaar, Mihaela},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/rauba2026iclr-deep/}
}