AB-STE: Adaptive Blended Gradient Estimation for Efficient Binarized Networks

Gupta, Siddharth; Kumar, Akash

doi:10.1007/978-3-032-06066-2_8

AB-STE: Adaptive Blended Gradient Estimation for Efficient Binarized Networks

Siddharth Gupta, Akash Kumar

ECML-PKDD 2025 pp. 125-140

doi:10.1007/978-3-032-06066-2_8 /ecmlpkdd/2025/gupta2025ecmlpkdd-abste/

Abstract

Binary Neural Networks (BNNs) offer a highly efficient alternative to traditional deep learning models by drastically reducing memory and computational demands, making them well-suited for deployment in resource-constrained environments like edge devices. Despite their efficiency, BNNs are often limited by inaccurate and unstable gradient estimation using traditional Straight Through Estimator (STE) methods, which disrupt gradient flow and impede convergence. BinaryConnect introduced STE to approximate the gradients of the sign function; however, this approximation causes significant inconsistencies, ultimately compromising training stability. While various methods have been proposed to address these issues, many fail to consider that minimizing estimation error can inadvertently reduce gradient stability. Such highly divergent gradients can increase the risk of vanishing or exploding gradients, thereby hindering effective training. In this paper, we propose two novel Adaptive Blended Straight Through Estimators ( AB-STE ): AB-ArcTan-STE and AB-Tanh-STE . Unlike previous methods, AB-STE blends a linear component with a non-linear function to provide both stability and expressiveness during training, addressing key challenges faced by BNNs. By combining the simplicity of linearity with the representational power of non-linearity, AB-STE maintains a balanced gradient flow throughout training, ensuring both stability and effective learning. Extensive experiments on CIFAR-10 and ImageNet demonstrate that AB-STE achieves superior performance, surpassing existing state-of-the-art methods. Specifically, our AB-Tanh-STE achieved an accuracy of 94.60% on ResNet-18 for CIFAR-10, and a Top-1 accuracy of 67.96% on ImageNet, demonstrating the effectiveness of our adaptive blending strategy in enhancing training stability and accuracy. Notably, the parameters were binarized to achieve efficiency while maintaining stable gradient flow.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Gupta and Kumar. "AB-STE: Adaptive Blended Gradient Estimation for Efficient Binarized Networks." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025. doi:10.1007/978-3-032-06066-2_8

Markdown

[Gupta and Kumar. "AB-STE: Adaptive Blended Gradient Estimation for Efficient Binarized Networks." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025.](https://mlanthology.org/ecmlpkdd/2025/gupta2025ecmlpkdd-abste/) doi:10.1007/978-3-032-06066-2_8

BibTeX

@inproceedings{gupta2025ecmlpkdd-abste,
  title     = {{AB-STE: Adaptive Blended Gradient Estimation for Efficient Binarized Networks}},
  author    = {Gupta, Siddharth and Kumar, Akash},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2025},
  pages     = {125-140},
  doi       = {10.1007/978-3-032-06066-2_8},
  url       = {https://mlanthology.org/ecmlpkdd/2025/gupta2025ecmlpkdd-abste/}
}