Multi-Task Domain Adaptation for Language Grounding with 3D Objects
Abstract
The existing works on object-level language grounding with 3D objects mostly focus on improving performance by utilizing the off-the-shelf pre-trained models to capture features, such as viewpoint selection or geometric priors. However, they have failed to consider exploring the cross-modal representation of language-vision alignment in the cross-domain field. To answer this problem, we propose a novel method called Domain Adaptation for Language Grounding (DA4LG) with 3D objects. Specifically, the proposed DA4LG consists of a visual adapter module with multi-task learning to realize vision-language alignment by comprehensive multimodal feature representation. Experimental results demonstrate that DA4LG competitively performs across visual and non-visual language descriptions, independent of the completeness of observation. DA4LG achieves state-of-the-art performance in the single-view setting and multi-view setting with the accuracy of 83.8% and 86.8% respectively in the language grounding benchmark SNARE. The simulation experiments show the well-practical and generalized performance of DA4LG compared to the existing methods. Our project is available at https://sites.google.com/view/da4lg.
Cite
Text
Sun et al. "Multi-Task Domain Adaptation for Language Grounding with 3D Objects." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72754-2_22Markdown
[Sun et al. "Multi-Task Domain Adaptation for Language Grounding with 3D Objects." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/sun2024eccv-multitask/) doi:10.1007/978-3-031-72754-2_22BibTeX
@inproceedings{sun2024eccv-multitask,
title = {{Multi-Task Domain Adaptation for Language Grounding with 3D Objects}},
author = {Sun, Penglei and Song, Yaoxian and Pan, Xinglin and Dong, Peijie and Yang, Xiaofei and Wang, Qiang and Li, Zhixu and Li, Tiefeng and Chu, Xiaowen},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
doi = {10.1007/978-3-031-72754-2_22},
url = {https://mlanthology.org/eccv/2024/sun2024eccv-multitask/}
}