QUTE: Quantifying Uncertainty in TinyML Models with Early-Exit-Assisted Ensembles for Model-Monitoring

Abstract

Uncertainty quantification (UQ) provides a resource-efficient solution for on-device monitoring of tinyML models deployed remotely without access to true labels. However, existing UQ methods impose significant memory and compute demands, making them impractical for ultra-low-power, KB-sized tinyML devices. Prior work has attempted to reduce overhead by using early-exit ensembles to quantify uncertainty in a single forward pass, but these approaches still carry prohibitive costs. To address this, we propose QUTE, a novel resource-efficient early-exit-assisted ensemble architecture optimized for tinyML models. QUTE introduces additional output blocks at the final exit of the base network, distilling early-exit knowledge into these blocks to form a diverse yet lightweight ensemble. We show that QUTE delivers superior uncertainty quality on tiny models, achieving comparable performance on larger models with 59% smaller model sizes than the closest prior work. When deployed on a microcontroller, QUTE demonstrates a 31% reduction in latency on average. In addition, we show that QUTE excels at detecting accuracy-drop events, outperforming all prior works.

Cite

Text

Ghanathe and Wilton. "QUTE: Quantifying Uncertainty in TinyML Models with Early-Exit-Assisted Ensembles for Model-Monitoring." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Ghanathe and Wilton. "QUTE: Quantifying Uncertainty in TinyML Models with Early-Exit-Assisted Ensembles for Model-Monitoring." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/ghanathe2025icml-qute/)

BibTeX

@inproceedings{ghanathe2025icml-qute,
  title     = {{QUTE: Quantifying Uncertainty in TinyML Models with Early-Exit-Assisted Ensembles for Model-Monitoring}},
  author    = {Ghanathe, Nikhil Pratap and Wilton, Steven J E},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {19286-19306},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/ghanathe2025icml-qute/}
}