Phoebus: A System for Extracting and Integrating Data from Unstructured and Ungrammatical Sources

Abstract

With the proliferation of online classifieds and auctions comes a new need to meaningfully search and organize the items for sale. However, since the seller’s item descriptions are not structured and do not conform to a standard set of values (think “Chevy ” versus “Chevrolet”), searching and or-ganizing this data is difficult. This paper describes a working demonstration of the Phoebus system which uses both record linkage and information extraction to parse out the meaning-ful attributes of an item description and assign them standard values. This allows the data to be sorted, searched and linked to other data sources where standard values for the attributes are required to link the sources together.

Cite

Text

Michelson and Knoblock. "Phoebus: A System for Extracting and Integrating Data from Unstructured and Ungrammatical Sources." AAAI Conference on Artificial Intelligence, 2006.

Markdown

[Michelson and Knoblock. "Phoebus: A System for Extracting and Integrating Data from Unstructured and Ungrammatical Sources." AAAI Conference on Artificial Intelligence, 2006.](https://mlanthology.org/aaai/2006/michelson2006aaai-phoebus/)

BibTeX

@inproceedings{michelson2006aaai-phoebus,
  title     = {{Phoebus: A System for Extracting and Integrating Data from Unstructured and Ungrammatical Sources}},
  author    = {Michelson, Matthew and Knoblock, Craig A.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2006},
  pages     = {1947-1948},
  url       = {https://mlanthology.org/aaai/2006/michelson2006aaai-phoebus/}
}