Structured Matching for Phrase Localization

Wang, Mingzhe; Azab, Mahmoud; Kojima, Noriyuki; Mihalcea, Rada; Deng, Jia

doi:10.1007/978-3-319-46484-8_42

Structured Matching for Phrase Localization

Mingzhe Wang, Mahmoud Azab, Noriyuki Kojima, Rada Mihalcea, Jia Deng

ECCV 2016 pp. 696-711

doi:10.1007/978-3-319-46484-8_42 /eccv/2016/wang2016eccv-structured/

Abstract

In this paper we introduce a new approach to phrase localization: grounding phrases in sentences to image regions. We propose a structured matching of phrases and regions that encourages the semantic relations between phrases to agree with the visual relations between regions. We formulate structured matching as a discrete optimization problem and relax it to a linear program. We use neural networks to embed regions and phrases into vectors, which then define the similarities (matching weights) between regions and phrases. We integrate structured matching with neural networks to enable end-to-end training. Experiments on Flickr30K Entities demonstrate the empirical effectiveness of our approach.

PDF ECCV Semantic Scholar

Cite

Text

Wang et al. "Structured Matching for Phrase Localization." European Conference on Computer Vision, 2016. doi:10.1007/978-3-319-46484-8_42

Markdown

[Wang et al. "Structured Matching for Phrase Localization." European Conference on Computer Vision, 2016.](https://mlanthology.org/eccv/2016/wang2016eccv-structured/) doi:10.1007/978-3-319-46484-8_42

BibTeX

@inproceedings{wang2016eccv-structured,
  title     = {{Structured Matching for Phrase Localization}},
  author    = {Wang, Mingzhe and Azab, Mahmoud and Kojima, Noriyuki and Mihalcea, Rada and Deng, Jia},
  booktitle = {European Conference on Computer Vision},
  year      = {2016},
  pages     = {696-711},
  doi       = {10.1007/978-3-319-46484-8_42},
  url       = {https://mlanthology.org/eccv/2016/wang2016eccv-structured/}
}