Agnostic Classification of Markovian Sequences
Abstract
Classification of finite sequences without explicit knowledge of their statistical nature is a fundamental problem with many important applications. We propose a new information theoretic approach to this problem which is based on the following ingredients: (i) se(cid:173) quences are similar when they are likely to be generated by the same source; (ii) cross entropies can be estimated via "universal compres(cid:173) sion"; (iii) Markovian sequences can be asymptotically-optimally merged. With these ingredients we design a method for the classification of discrete sequences whenever they can be compressed. We introduce the method and illustrate its application for hierarchical clustering of languages and for estimating similarities of protein sequences.
Cite
Text
El-Yaniv et al. "Agnostic Classification of Markovian Sequences." Neural Information Processing Systems, 1997.Markdown
[El-Yaniv et al. "Agnostic Classification of Markovian Sequences." Neural Information Processing Systems, 1997.](https://mlanthology.org/neurips/1997/elyaniv1997neurips-agnostic/)BibTeX
@inproceedings{elyaniv1997neurips-agnostic,
title = {{Agnostic Classification of Markovian Sequences}},
author = {El-Yaniv, Ran and Fine, Shai and Tishby, Naftali},
booktitle = {Neural Information Processing Systems},
year = {1997},
pages = {465-471},
url = {https://mlanthology.org/neurips/1997/elyaniv1997neurips-agnostic/}
}