Sense Discovery via Co-Clustering on Images and Text

Abstract

We present a co-clustering framework that can be used to discover multiple semantic and visual senses of a given Noun Phrase (NP). Unlike traditional clustering approaches which assume a one-to-one mapping between the clusters in the text-based feature space and the visual space, we adopt a one-to-many mapping between the two spaces. This is primarily because each semantic sense (concept) can correspond to different visual senses due to viewpoint and appearance variations. Our structure-EM style optimization not only extracts the multiple senses in both semantic and visual feature space, but also discovers the mapping between the senses. We introduce a challenging dataset (CMU Polysemy-30) for this problem consisting of 30 NPs ($\sim$5600 labeled instances out of $\sim$22K total instances). We have also conducted a large-scale experiment that performs sense disambiguation for $\sim$2000 NPs.

Cite

Text

Chen et al. "Sense Discovery via Co-Clustering on Images and Text." Conference on Computer Vision and Pattern Recognition, 2015. doi:10.1109/CVPR.2015.7299167

Markdown

[Chen et al. "Sense Discovery via Co-Clustering on Images and Text." Conference on Computer Vision and Pattern Recognition, 2015.](https://mlanthology.org/cvpr/2015/chen2015cvpr-sense/) doi:10.1109/CVPR.2015.7299167

BibTeX

@inproceedings{chen2015cvpr-sense,
  title     = {{Sense Discovery via Co-Clustering on Images and Text}},
  author    = {Chen, Xinlei and Ritter, Alan and Gupta, Abhinav and Mitchell, Tom},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2015},
  doi       = {10.1109/CVPR.2015.7299167},
  url       = {https://mlanthology.org/cvpr/2015/chen2015cvpr-sense/}
}