Overview of AutoFeed: An Unsupervised Learning System for Generating Webfeeds

Gazen, Bora; Minton, Steven

Overview of AutoFeed: An Unsupervised Learning System for Generating Webfeeds

AAAI 2006 pp. 1601-1604

/aaai/2006/gazen2006aaai-overview/

Abstract

The AutoFeed system automatically extracts data from semi-structured web sites. Previously, researchers have developed two types of supervised learning approaches for extracting web data: methods that create precise, site-specific extraction rules and methods that learn less-precise site-independent ex-traction rules. In either case, significant training is required. AutoFeed follows a third, more ambitious approach, in which unsupervised learning is used to analyze sites and discover their structure. Our method relies on a set of heterogeneous “experts”, each of which is capable of identifying certain types of generic structure. Each expert represents its discov-eries as “hints”. Based on these hints, our system clusters the pages and identifies semi-structured data that can be ex-tracted. To identify a good clustering, we use a probabilistic model of the hint-generation process. This paper summarizes our formulation of the fully-automatic web-extraction prob-lem, our clustering approach, and our results on a set of ex-periments.

PDF AAAI Semantic Scholar

Cite

Text

Gazen and Minton. "Overview of AutoFeed: An Unsupervised Learning System for Generating Webfeeds." AAAI Conference on Artificial Intelligence, 2006.

Markdown

[Gazen and Minton. "Overview of AutoFeed: An Unsupervised Learning System for Generating Webfeeds." AAAI Conference on Artificial Intelligence, 2006.](https://mlanthology.org/aaai/2006/gazen2006aaai-overview/)

BibTeX

@inproceedings{gazen2006aaai-overview,
  title     = {{Overview of AutoFeed: An Unsupervised Learning System for Generating Webfeeds}},
  author    = {Gazen, Bora and Minton, Steven},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2006},
  pages     = {1601-1604},
  url       = {https://mlanthology.org/aaai/2006/gazen2006aaai-overview/}
}