Unpacking Multi-Valued Symbolic Features and Classes in Memory-Based Language Learning

Abstract

In supervised machine-learning applications to natural language processing, tasks are typically formulated as classification tasks mapping multi-valued features to multi-valued classes. Memory-based or instance-based learning algorithms are suited for such representations, but they are not restricted to them; both features and classes may be unpacked in binary values. We demonstrate in a matrix of empirical tests on a range of natural language learning tasks that when using k = 1 in the k \\Gamma NN classifier kernel, binary unpacking of features and classes tends to be harmful to generalization accuracy. Unpacking features and classes causes the kernel classifier to rely on smaller sets of nearest neighbors, which generally leads to more misclassifications; only when the data is not sparse in the multi-valued case (when the average number of equidistant nearest neighbors is well above a handful), unpacking can lead to improved generalization accuracy. 1. Multi-val...

Cite

Text

van den Bosch and Zavrel. "Unpacking Multi-Valued Symbolic Features and Classes in Memory-Based Language Learning." International Conference on Machine Learning, 2000.

Markdown

[van den Bosch and Zavrel. "Unpacking Multi-Valued Symbolic Features and Classes in Memory-Based Language Learning." International Conference on Machine Learning, 2000.](https://mlanthology.org/icml/2000/vandenbosch2000icml-unpacking/)

BibTeX

@inproceedings{vandenbosch2000icml-unpacking,
  title     = {{Unpacking Multi-Valued Symbolic Features and Classes in Memory-Based Language Learning}},
  author    = {van den Bosch, Antal and Zavrel, Jakub},
  booktitle = {International Conference on Machine Learning},
  year      = {2000},
  pages     = {1055-1062},
  url       = {https://mlanthology.org/icml/2000/vandenbosch2000icml-unpacking/}
}