Learning with Contextual Information in Non-Stationary Environments

Abstract

We consider a repeated decision-making setting in which the decision maker has access to contextual information and lacks a model or a priori knowledge of the relationship between the actions, context, and costs that they aim to minimize. Moreover, we assume that the environment may be non-stationary due to the presence of other agents that may be reacting to our decisions. We propose an algorithm inspired by log-linear learning that uses Boltzmann distributions to generate stochastic policies. We consider two general notions of context and provide regret bounds for each: 1) a finite number of possible measurements and 2) a continuum of measurements that weight a set of finite classes. In the non-stationary setting, we incur some regret but can make it arbitrarily small. We illustrate the operation of the algorithm through two examples: one that uses synthetic data (based on the rock-paper-scissors game) and another that uses real data for malware classification. Both examples exhibit (by construction or naturally) significant lack of stationarity.

Cite

Text

Anderson and Hespanha. "Learning with Contextual Information in Non-Stationary Environments." Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, 2025.

Markdown

[Anderson and Hespanha. "Learning with Contextual Information in Non-Stationary Environments." Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, 2025.](https://mlanthology.org/l4dc/2025/anderson2025l4dc-learning/)

BibTeX

@inproceedings{anderson2025l4dc-learning,
  title     = {{Learning with Contextual Information in Non-Stationary Environments}},
  author    = {Anderson, Sean and Hespanha, Joao P.},
  booktitle = {Proceedings of the 7th Annual Learning for Dynamics \& Control Conference},
  year      = {2025},
  pages     = {856-868},
  volume    = {283},
  url       = {https://mlanthology.org/l4dc/2025/anderson2025l4dc-learning/}
}