BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping

Abstract

We propose a metadata-aware self-supervised learning (SSL) framework useful for fine-grained classification and ecological mapping of bird species around the world. Our framework unifies two SSL strategies: Contrastive Learning (CL) and Masked Image Modeling (MIM), while also enriching the embedding space with meta-information available with ground-level imagery of birds. We separately train uni-modal and cross-modal ViT on a novel cross-view global birds species dataset containing ground-level imagery, metadata (location, time), and corresponding satellite imagery. We demonstrate that our models learn fine-grained and geographically conditioned features of birds, by evaluating on two downstream tasks: fine-grained visual classification (FGVC) and cross-modal retrieval. Pre-trained models learned using our framework achieve SotA performance on FGVC of iNAT-2021 birds as well as in transfer learning setting for CUB-200-2011 and NABirds datasets. Moreover, the impressive cross-modal retrieval performance of our model enables the creation of species distribution maps across any geographic region. The dataset and source code will be released at https://github.com/TBD.

Cite

Text

Sastry et al. "BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping." Winter Conference on Applications of Computer Vision, 2024.

Markdown

[Sastry et al. "BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping." Winter Conference on Applications of Computer Vision, 2024.](https://mlanthology.org/wacv/2024/sastry2024wacv-birdsat/)

BibTeX

@inproceedings{sastry2024wacv-birdsat,
  title     = {{BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping}},
  author    = {Sastry, Srikumar and Khanal, Subash and Dhakal, Aayush and Huang, Di and Jacobs, Nathan},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2024},
  pages     = {7136-7145},
  url       = {https://mlanthology.org/wacv/2024/sastry2024wacv-birdsat/}
}