Boundary Matching and Refinement Network with Cross-Modal Contrastive Learning for Temporal Moment Localization

Moon, Jinyoung; Seol, Muah; Kim, Jonghee

doi:10.1007/978-3-031-91581-9_21

Boundary Matching and Refinement Network with Cross-Modal Contrastive Learning for Temporal Moment Localization

Jinyoung Moon, Muah Seol, Jonghee Kim

ECCVW 2024 pp. 294-310

doi:10.1007/978-3-031-91581-9_21 /eccvw/2024/moon2024eccvw-boundary/

Abstract

Temporal Moment Localization (TML) identifies specific temporal intervals in untrimmed videos based on a sentence query. Traditional methods using 2D temporal maps face limitations due to rigid boundaries and GPU constraints. We propose a Boundary Matching and Refinement Network (BMRN) that dynamically adjusts moment proposals with predicted center and length offsets for precise localization. BMRN integrates boundary matching and refinement maps with a length-aware cross-modal interactive proposal feature map. Enhanced with Cross-Modal Contrastive Learning (CCL), BMRN-CCL reduces the impact of visually and semantically similar negative samples. Extensive ablation studies and benchmarks on Charades-STA and ActivityNet Captions datasets demonstrate the superior performance of BMRN and BMRN-CCL, surpassing state-of-the-art methods.

PDF ECCVW Semantic Scholar

Cite

Text

Moon et al. "Boundary Matching and Refinement Network with Cross-Modal Contrastive Learning for Temporal Moment Localization." European Conference on Computer Vision Workshops, 2024. doi:10.1007/978-3-031-91581-9_21

Markdown

[Moon et al. "Boundary Matching and Refinement Network with Cross-Modal Contrastive Learning for Temporal Moment Localization." European Conference on Computer Vision Workshops, 2024.](https://mlanthology.org/eccvw/2024/moon2024eccvw-boundary/) doi:10.1007/978-3-031-91581-9_21

BibTeX

@inproceedings{moon2024eccvw-boundary,
  title     = {{Boundary Matching and Refinement Network with Cross-Modal Contrastive Learning for Temporal Moment Localization}},
  author    = {Moon, Jinyoung and Seol, Muah and Kim, Jonghee},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2024},
  pages     = {294-310},
  doi       = {10.1007/978-3-031-91581-9_21},
  url       = {https://mlanthology.org/eccvw/2024/moon2024eccvw-boundary/}
}