Learning Visual-Semantic Subspace Representations

Abstract

Learning image representations that capture rich semantic relationships remains a significant challenge. Existing approaches are either contrastive, lacking robust theoretical guarantees, or struggle to effectively represent the partial orders inherent to structured visual-semantic data. In this paper, we introduce a nuclear norm-based loss function, grounded in the same information theoretic principles that have proved effective in self-supervised learning. We present a theoretical characterization of this loss, demonstrating that, in addition to promoting class orthogonality, it encodes the spectral geometry of the data within a subspace lattice. This geometric representation allows us to associate logical propositions with subspaces, ensuring that our learned representations adhere to a predefined symbolic structure.

Cite

Text

Moreira et al. "Learning Visual-Semantic Subspace Representations." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.

Markdown

[Moreira et al. "Learning Visual-Semantic Subspace Representations." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.](https://mlanthology.org/aistats/2025/moreira2025aistats-learning/)

BibTeX

@inproceedings{moreira2025aistats-learning,
  title     = {{Learning Visual-Semantic Subspace Representations}},
  author    = {Moreira, Gabriel and Marques, Manuel and Costeira, Joao and Hauptmann, Alexander G},
  booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics},
  year      = {2025},
  pages     = {3727-3735},
  volume    = {258},
  url       = {https://mlanthology.org/aistats/2025/moreira2025aistats-learning/}
}