Joint Prediction of Topics in a URL Hierarchy

Abstract

We study the problem of jointly predicting topics for all web pages within URL hierarchies. We employ a graphical model in which latent variables represent the predominant topic within a subtree of the URL hierarchy. The model is built around a generative process that infers how web site administrators hierarchically structure web site according to topic, and how web page content is generated depending on the page topic. The resulting predictive model is linear in a joint feature map of content, topic labels, and the latent variables. Inference reduces to message passing in a tree-structured graph; parameter estimation is carried out using concave-convex optimization. We present a case study on web page classification for a targeted advertising application.

Cite

Text

Großhans et al. "Joint Prediction of Topics in a URL Hierarchy." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2014. doi:10.1007/978-3-662-44848-9_33

Markdown

[Großhans et al. "Joint Prediction of Topics in a URL Hierarchy." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2014.](https://mlanthology.org/ecmlpkdd/2014/grohans2014ecmlpkdd-joint/) doi:10.1007/978-3-662-44848-9_33

BibTeX

@inproceedings{grohans2014ecmlpkdd-joint,
  title     = {{Joint Prediction of Topics in a URL Hierarchy}},
  author    = {Großhans, Michael and Sawade, Christoph and Scheffer, Tobias and Landwehr, Niels},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2014},
  pages     = {514-529},
  doi       = {10.1007/978-3-662-44848-9_33},
  url       = {https://mlanthology.org/ecmlpkdd/2014/grohans2014ecmlpkdd-joint/}
}