RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction

Abstract

We introduce RIPE, an innovative reinforcement learning-based framework for weakly-supervised training of a keypoint extractor that excels in both detection and description tasks. In contrast to conventional training regimes that depend heavily on artificial transformations, pre-generated models, or 3D data, RIPE requires only a binary label indicating whether paired images represent the same scene.This minimal supervision significantly expands the pool of training data, enabling the creation of a highly generalized and robust keypoint extractor. RIPE utilizes the encoder's intermediate layers for the description of the keypoints with a hyper-column approach to integrate information from different scales. Additionally, we propose a auxiliary loss to enhance the discriminative capability of the learned descriptors.Comprehensive evaluations on standard benchmarks demonstrate that RIPE simplifies data preparation while achieving competitive performance compared to state-of-the-art techniques, marking a significant advancement in robust keypoint extraction and description.Code and data will be made available for research purposes.

Cite

Text

Künzel et al. "RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction." International Conference on Computer Vision, 2025.

Markdown

[Künzel et al. "RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/kunzel2025iccv-ripe/)

BibTeX

@inproceedings{kunzel2025iccv-ripe,
  title     = {{RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction}},
  author    = {Künzel, Johannes and Hilsmann, Anna and Eisert, Peter},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {4868-4877},
  url       = {https://mlanthology.org/iccv/2025/kunzel2025iccv-ripe/}
}