A Machine Learning Approach to Building Domain-Specific Search Engines

McCallum, Andrew; Nigam, Kamal; Rennie, Jason; Seymore, Kristie

A Machine Learning Approach to Building Domain-Specific Search Engines

Andrew McCallum, Kamal Nigam, Jason Rennie, Kristie Seymore

IJCAI 1999 pp. 662-667

/ijcai/1999/mccallum1999ijcai-machine/

Abstract

Domain-specific search engines are becoming increasingly popular because they offer increased accuracy and extra features not possible with general, Web-wide search engines. Unfortunately, they are also difficult and timeconsuming to maintain. This paper proposes the use of machine learning techniques to greatly automate the creation and maintenance of domain-specific search engines. We describe new research in reinforcement learning, text classification and information extraction that enables efficient spidering, populates topic hierarchies, and identifies informative text segments. Using these techniques, we have built a demonstration system: a search engine for computer science research papers available at www.cora.justresearch.com. 1 Introduction As the amount of information on the World Wide Web grows, it becomes increasingly difficult to find just what wewant. While general-purpose search engines suchas AltaVista and HotBot offer high coverage, they often provi...

PDF Semantic Scholar

Cite

Text

McCallum et al. "A Machine Learning Approach to Building Domain-Specific Search Engines." International Joint Conference on Artificial Intelligence, 1999.

Markdown

[McCallum et al. "A Machine Learning Approach to Building Domain-Specific Search Engines." International Joint Conference on Artificial Intelligence, 1999.](https://mlanthology.org/ijcai/1999/mccallum1999ijcai-machine/)

BibTeX

@inproceedings{mccallum1999ijcai-machine,
  title     = {{A Machine Learning Approach to Building Domain-Specific Search Engines}},
  author    = {McCallum, Andrew and Nigam, Kamal and Rennie, Jason and Seymore, Kristie},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {1999},
  pages     = {662-667},
  url       = {https://mlanthology.org/ijcai/1999/mccallum1999ijcai-machine/}
}