Regularized Data Programming with Automated Bayesian Prior Selection

Abstract

The cost of manual data labeling can be a significant obstacle in supervised learning. Data programming (DP) offers a weakly supervised solution for training dataset creation, wherein the outputs of user-defined programmatic labeling functions (LFs) are reconciled through unsupervised learning. However, DP can fail to outperform an unweighted majority vote in some scenarios, including low-data contexts. This work introduces a Bayesian extension of classical DP that mitigates failures of unsupervised learning by augmenting the DP objective with regularization terms. Regularized learning is achieved through maximum a posteriori estimation with informative priors. Majority vote is proposed as a proxy signal for automated prior parameter selection. Results suggest that regularized DP improves performance relative to maximum likelihood and majority voting, confers greater interpretability, and bolsters performance in low-data regimes.

Cite

Text

Maasch et al. "Regularized Data Programming with Automated Bayesian Prior Selection." ICML 2023 Workshops: SPIGM, 2023.

Markdown

[Maasch et al. "Regularized Data Programming with Automated Bayesian Prior Selection." ICML 2023 Workshops: SPIGM, 2023.](https://mlanthology.org/icmlw/2023/maasch2023icmlw-regularized/)

BibTeX

@inproceedings{maasch2023icmlw-regularized,
  title     = {{Regularized Data Programming with Automated Bayesian Prior Selection}},
  author    = {Maasch, Jacqueline R. M. A. and Zhang, Hao and Yang, Qian and Wang, Fei and Kuleshov, Volodymyr},
  booktitle = {ICML 2023 Workshops: SPIGM},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/maasch2023icmlw-regularized/}
}