Cross-Modal Representation Learning and Relation Reasoning for Bidirectional Adaptive Manipulation
Abstract
Since single-modal controllable manipulation typically requires supervision of information from other modalities or cooperation with complex software and experts, this paper addresses the problem of cross-modal adaptive manipulation (CAM). The novel task performs cross-modal semantic alignment from mutual supervision and implements bidirectional exchange of attributes, relations, or objects in parallel, benefiting both modalities while significantly reducing manual effort. We introduce a robust solution for CAM, which includes two essential modules, namely Heterogeneous Representation Learning (HRL) and Cross-modal Relation Reasoning (CRR). The former is designed to perform representation learning for cross-modal semantic alignment on heterogeneous graph nodes. The latter is adopted to identify and exchange the focused attributes, relations, or objects in both modalities. Our method produces pleasing cross-modal outputs on CUB and Visual Genome.
Cite
Text
Li et al. "Cross-Modal Representation Learning and Relation Reasoning for Bidirectional Adaptive Manipulation." International Joint Conference on Artificial Intelligence, 2022. doi:10.24963/IJCAI.2022/447Markdown
[Li et al. "Cross-Modal Representation Learning and Relation Reasoning for Bidirectional Adaptive Manipulation." International Joint Conference on Artificial Intelligence, 2022.](https://mlanthology.org/ijcai/2022/li2022ijcai-cross/) doi:10.24963/IJCAI.2022/447BibTeX
@inproceedings{li2022ijcai-cross,
title = {{Cross-Modal Representation Learning and Relation Reasoning for Bidirectional Adaptive Manipulation}},
author = {Li, Lei and Fan, Kai and Yuan, Chun},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2022},
pages = {3222-3228},
doi = {10.24963/IJCAI.2022/447},
url = {https://mlanthology.org/ijcai/2022/li2022ijcai-cross/}
}