Addressing Imbalance in Multi-Label Classification Using Structured Hellinger Forests

Abstract

The multi-label classification problem involves finding a model that maps a set of input features to more than one output label. Class imbalance is a serious issue in multi-label classification. We introduce an extension of structured forests, a type of random forest used for structured prediction, called Sparse Oblique Structured Hellinger Forests (SOSHF). We explore using structured forests in the general multi-label setting and propose a new imbalance-aware formulation by altering how the splitting functions are learned in two ways. First, we account for cost-sensitivity when converting the multi-label problem to a single-label problem at each node in the tree. Second, we introduce a new objective function for determining oblique splits based on the Hellinger distance, a splitting criterion that has been shown to be robust to class imbalance. We empirically validate our method on a number of benchmarks against standard and state-of-the-art multi-label classification algorithms with improved results.

Cite

Text

Daniels and Metaxas. "Addressing Imbalance in Multi-Label Classification Using Structured Hellinger Forests." AAAI Conference on Artificial Intelligence, 2017. doi:10.1609/AAAI.V31I1.10908

Markdown

[Daniels and Metaxas. "Addressing Imbalance in Multi-Label Classification Using Structured Hellinger Forests." AAAI Conference on Artificial Intelligence, 2017.](https://mlanthology.org/aaai/2017/daniels2017aaai-addressing/) doi:10.1609/AAAI.V31I1.10908

BibTeX

@inproceedings{daniels2017aaai-addressing,
  title     = {{Addressing Imbalance in Multi-Label Classification Using Structured Hellinger Forests}},
  author    = {Daniels, Zachary Alan and Metaxas, Dimitris N.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2017},
  pages     = {1826-1832},
  doi       = {10.1609/AAAI.V31I1.10908},
  url       = {https://mlanthology.org/aaai/2017/daniels2017aaai-addressing/}
}