Domain Constraints Improve Risk Prediction When Outcome Data Is Missing

Abstract

Machine learning models often predict the outcome resulting from a human decision. For example, if a doctor tests a patient for disease, will the patient test positive? A challenge is that the human decision *censors* the outcome data: we only observe test outcomes for patients doctors historically tested. Untested patients, for whom outcomes are unobserved, may differ from tested patients along observed and unobserved dimensions. We describe a Bayesian model to capture this setting whose purpose is to estimate risk for both tested and untested patients. To aid model estimation, we propose two *domain-specific* constraints which are plausible in health settings: a *prevalence constraint*, where the overall disease prevalence is known, and an *expertise constraint*, where the human decision-maker deviates from purely risk-based decision-making only along a constrained feature set. We show theoretically and on synthetic data that the constraints can improve parameter inference. We apply our model to a case study of cancer risk prediction, showing that the model can identify suboptimalities in test allocation and that the prevalence constraint increases the plausibility of inferences.

Cite

Text

Balachandar et al. "Domain Constraints Improve Risk Prediction When Outcome Data Is Missing." NeurIPS 2023 Workshops: DistShift, 2023.

Markdown

[Balachandar et al. "Domain Constraints Improve Risk Prediction When Outcome Data Is Missing." NeurIPS 2023 Workshops: DistShift, 2023.](https://mlanthology.org/neuripsw/2023/balachandar2023neuripsw-domain/)

BibTeX

@inproceedings{balachandar2023neuripsw-domain,
  title     = {{Domain Constraints Improve Risk Prediction When Outcome Data Is Missing}},
  author    = {Balachandar, Sidhika and Garg, Nikhil and Pierson, Emma},
  booktitle = {NeurIPS 2023 Workshops: DistShift},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/balachandar2023neuripsw-domain/}
}