Strategic A/B Testing via Maximum Probability-Driven Two-Armed Bandit
Abstract
Detecting a minor average treatment effect is a major challenge in large-scale applications, where even minimal improvements can have a significant economic impact. Traditional methods, reliant on normal distribution-based or expanded statistics, often fail to identify such minor effects because of their inability to handle small discrepancies with sufficient sensitivity. This work leverages a counterfactual outcome framework and proposes a maximum probability-driven two-armed bandit (TAB) process by weighting the mean volatility statistic, which controls Type I error. The implementation of permutation methods further enhances the robustness and efficacy. The established strategic central limit theorem (SCLT) demonstrates that our approach yields a more concentrated distribution under the null hypothesis and a less concentrated one under the alternative hypothesis, greatly improving statistical power. The experimental results indicate a significant improvement in the A/B testing, highlighting the potential to reduce experimental costs while maintaining high statistical power.
Cite
Text
Zhang et al. "Strategic A/B Testing via Maximum Probability-Driven Two-Armed Bandit." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Zhang et al. "Strategic A/B Testing via Maximum Probability-Driven Two-Armed Bandit." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/zhang2025icml-strategic/)BibTeX
@inproceedings{zhang2025icml-strategic,
title = {{Strategic A/B Testing via Maximum Probability-Driven Two-Armed Bandit}},
author = {Zhang, Yu and Zhao, Shanshan and Wan, Bokui and Wang, Jinjuan and Yan, Xiaodong},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {77069-77089},
volume = {267},
url = {https://mlanthology.org/icml/2025/zhang2025icml-strategic/}
}