Discrete Latent Perspective Learning for Segmentation and Detection

Abstract

In this paper, we address the challenge of Perspective-Invariant Learning in machine learning and computer vision, which involves enabling a network to understand images from varying perspectives to achieve consistent semantic interpretation. While standard approaches rely on the labor-intensive collection of multi-view images or limited data augmentation techniques, we propose a novel framework, Discrete Latent Perspective Learning (DLPL), for latent multi-perspective fusion learning using conventional single-view images. DLPL comprises three main modules: Perspective Discrete Decomposition (PDD), Perspective Homography Transformation (PHT), and Perspective Invariant Attention (PIA), which work together to discretize visual features, transform perspectives, and fuse multi-perspective semantic information, respectively. DLPL is a universal perspective learning framework applicable to a variety of scenarios and vision tasks. Extensive experiments demonstrate that DLPL significantly enhances the network’s capacity to depict images across diverse scenarios (daily photos, UAV, auto-driving) and tasks (detection, segmentation).

Cite

Text

Ji et al. "Discrete Latent Perspective Learning for Segmentation and Detection." International Conference on Machine Learning, 2024.

Markdown

[Ji et al. "Discrete Latent Perspective Learning for Segmentation and Detection." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/ji2024icml-discrete/)

BibTeX

@inproceedings{ji2024icml-discrete,
  title     = {{Discrete Latent Perspective Learning for Segmentation and Detection}},
  author    = {Ji, Deyi and Zhao, Feng and Zhu, Lanyun and Jin, Wenwei and Lu, Hongtao and Ye, Jieping},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {21719-21730},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/ji2024icml-discrete/}
}