Handling Missing Data in Decision Trees: A Probabilistic Approach

Pasha Khosravi, Antonio Vergari, YooJung Choi, Yitao Liang, Guy Van den Broeck

ICMLW 2020

/icmlw/2020/khosravi2020icmlw-handling/

Abstract

Decision trees are a popular family of models due to their attractive properties such as interpretability and ability to handle heterogeneous data. Concurrently, missing data is a prevalent occurrence that hinders performance of machine learning models. As such, handling missing data in decision trees is a well studied problem. In this paper, we tackle this problem by taking a probabilistic approach. At deployment time, we use tractable density estimators to compute the "expected prediction'' of our models. At learning time, we fine-tune parameters of already learned trees by minimizing their "expected prediction loss'' w.r.t.\ our density estimators. We provide brief experiments showcasing effectiveness of our methods compared to few baselines.

PDF ICMLW OpenReview Semantic Scholar

Cite

Text

Khosravi et al. "Handling Missing Data in Decision Trees: A Probabilistic Approach." ICML 2020 Workshops: Artemiss, 2020.

Markdown

[Khosravi et al. "Handling Missing Data in Decision Trees: A Probabilistic Approach." ICML 2020 Workshops: Artemiss, 2020.](https://mlanthology.org/icmlw/2020/khosravi2020icmlw-handling/)

BibTeX

@inproceedings{khosravi2020icmlw-handling,
  title     = {{Handling Missing Data in Decision Trees: A Probabilistic Approach}},
  author    = {Khosravi, Pasha and Vergari, Antonio and Choi, YooJung and Liang, Yitao and Van den Broeck, Guy},
  booktitle = {ICML 2020 Workshops: Artemiss},
  year      = {2020},
  url       = {https://mlanthology.org/icmlw/2020/khosravi2020icmlw-handling/}
}