Accelerated Training on Low-Power Edge Devices
Abstract
Training on edge devices poses several challenges as these devices are generally resource-constrained, especially in terms of power. State-of-the-art techniques at the device level reduce the GPU frequency to enforce power constraints, leading to a significant increase in training time. To accelerate training, we propose to jointly adjust the system and application parameters (in our case, the GPU frequency and the batch size of the training task) while adhering to the power constraints on devices. We introduce a novel cross-layer methodology that combines predictions of batch size efficiency and device profiling to achieve the desired optimization. Our evaluation on real hardware shows that our method outperforms the current baselines that depend on state of the art techniques, reducing the training time by up to $2.3\times$ with results very close to optimal. Our measurements also indicate a substantial reduction in the overall energy used for the training process. These gains are achieved without reduction in the performance of the trained model.
Cite
Text
Ahmed et al. "Accelerated Training on Low-Power Edge Devices." Transactions on Machine Learning Research, 2025.Markdown
[Ahmed et al. "Accelerated Training on Low-Power Edge Devices." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/ahmed2025tmlr-accelerated/)BibTeX
@article{ahmed2025tmlr-accelerated,
title = {{Accelerated Training on Low-Power Edge Devices}},
author = {Ahmed, Mohamed Aboelenien and Pfeiffer, Kilian and Abboud, Osama and Khalili, Ramin and Khdr, Heba and Henkel, Joerg},
journal = {Transactions on Machine Learning Research},
year = {2025},
url = {https://mlanthology.org/tmlr/2025/ahmed2025tmlr-accelerated/}
}