Entropy Regularization for Population Estimation

Chugg, Ben; Henderson, Peter; Goldin, Jacob; Ho, Daniel E.

doi:10.1609/AAAI.V37I10.26438

Entropy Regularization for Population Estimation

Ben Chugg, Peter Henderson, Jacob Goldin, Daniel E. Ho

AAAI 2023 pp. 12198-12204

doi:10.1609/AAAI.V37I10.26438 /aaai/2023/chugg2023aaai-entropy/

Abstract

Entropy regularization is known to improve exploration in sequential decision-making problems. We show that this same mechanism can also lead to nearly unbiased and lower-variance estimates of the mean reward in the optimize-and-estimate structured bandit setting. Mean reward estimation (i.e., population estimation) tasks have recently been shown to be essential for public policy settings where legal constraints often require precise estimates of population metrics. We show that leveraging entropy and KL divergence can yield a better trade-off between reward and estimator variance than existing baselines, all while remaining nearly unbiased. These properties of entropy regularization illustrate an exciting potential for bringing together the optimal exploration and estimation literature.

PDF AAAI Semantic Scholar

Cite

Text

Chugg et al. "Entropy Regularization for Population Estimation." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I10.26438

Markdown

[Chugg et al. "Entropy Regularization for Population Estimation." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/chugg2023aaai-entropy/) doi:10.1609/AAAI.V37I10.26438

BibTeX

@inproceedings{chugg2023aaai-entropy,
  title     = {{Entropy Regularization for Population Estimation}},
  author    = {Chugg, Ben and Henderson, Peter and Goldin, Jacob and Ho, Daniel E.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {12198-12204},
  doi       = {10.1609/AAAI.V37I10.26438},
  url       = {https://mlanthology.org/aaai/2023/chugg2023aaai-entropy/}
}