WeakMCN: Multi-Task Collaborative Network for Weakly Supervised Referring Expression Comprehension and Segmentation

Abstract

Weakly supervised referring expression comprehension (WREC) and segmentation (WRES) aim to learn object grounding based on a given expression using weak supervision signals like image-text pairs. While these tasks have traditionally been modeled separately, we argue that they can benefit from joint learning in a multi-task framework. To this end, we propose WeakMCN, a novel multi-task collaborative network that effectively combines WREC and WRES with a dual-branch architecture. Specifically, the WREC branch is formulated as anchor-based contrastive learning, which also acts as a teacher to supervise the WRES branch. In WeakMCN, we propose two innovative designs to facilitate multi-task collaboration, namely Dynamic Visual Feature Enhancement (DVFE) and Collaborative Consistency Module (CCM). DVFE dynamically combines various pre-trained visual knowledge to meet different task requirements, while CCM promotes cross-task consistency from the perspective of optimization. Extensive experimental results on three popular REC and RES benchmarks, i.e., RefCOCO, RefCOCO+, and RefCOCOg, consistently demonstrate performance gains of WeakMCN over state-of-the-art single-task alternatives, e.g., up to 3.91% and 13.11% on RefCOCO for WREC and WRES tasks, respectively. Furthermore, experiments also validate the strong generalization ability of WeakMCN in both semi-supervised REC and RES settings against existing methods, e.g., +8.94% for semi-REC and +7.71% for semi-RES on 1% RefCOCO.

Cite

Text

Cheng et al. "WeakMCN: Multi-Task Collaborative Network for Weakly Supervised Referring Expression Comprehension and Segmentation." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.00857

Markdown

[Cheng et al. "WeakMCN: Multi-Task Collaborative Network for Weakly Supervised Referring Expression Comprehension and Segmentation." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/cheng2025cvpr-weakmcn/) doi:10.1109/CVPR52734.2025.00857

BibTeX

@inproceedings{cheng2025cvpr-weakmcn,
  title     = {{WeakMCN: Multi-Task Collaborative Network for Weakly Supervised Referring Expression Comprehension and Segmentation}},
  author    = {Cheng, Silin and Liu, Yang and He, Xinwei and Ourselin, Sebastien and Tan, Lei and Luo, Gen},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {9175-9185},
  doi       = {10.1109/CVPR52734.2025.00857},
  url       = {https://mlanthology.org/cvpr/2025/cheng2025cvpr-weakmcn/}
}