An Intelligent Battery Controller Using Bias-Corrected Q-Learning

Abstract

The transition to renewables requires storage to help smooth short-term variations in energy from wind and solar sources, as well as to respond to spikes in electricity spot prices, which can easily exceed 20 times their average. Efficient operation of an energy storage device is a fundamental problem, yet classical algorithms such as $Q$-learning can diverge for millions of iterations, limiting practical applications. We have traced this behavior to the max-operator bias, which is exacerbated by high volatility in the reward function, and high discount factors due to the small time steps. We propose an elegant bias correction procedure and demonstrate its effectiveness.

Cite

Text

Lee and Powell. "An Intelligent Battery Controller Using Bias-Corrected Q-Learning." AAAI Conference on Artificial Intelligence, 2012. doi:10.1609/AAAI.V26I1.8164

Markdown

[Lee and Powell. "An Intelligent Battery Controller Using Bias-Corrected Q-Learning." AAAI Conference on Artificial Intelligence, 2012.](https://mlanthology.org/aaai/2012/lee2012aaai-intelligent/) doi:10.1609/AAAI.V26I1.8164

BibTeX

@inproceedings{lee2012aaai-intelligent,
  title     = {{An Intelligent Battery Controller Using Bias-Corrected Q-Learning}},
  author    = {Lee, Donghun and Powell, Warren B.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2012},
  pages     = {316-322},
  doi       = {10.1609/AAAI.V26I1.8164},
  url       = {https://mlanthology.org/aaai/2012/lee2012aaai-intelligent/}
}