Pix2Poly: A Sequence Prediction Method for End-to-End Polygonal Building Footprint Extraction from Remote Sensing Imagery
Abstract
Extraction of building footprint polygons from remotely sensed data is essential for several urban understanding tasks such as reconstruction navigation & mapping. Despite significant progress in the area extracting accurate polygonal vector building footprints remains an open problem. In this paper we introduce Pix2Poly an attention-based end-to-end trainable & differentiable deep neural network capable of directly generating explicit high-quality building footprints in a ring graph format. Pix2Poly employs a generative encoder-decoder transformer to produce a sequence of graph vertex tokens whose connectivity information is learned by an optimal matching network. Compared to previous graph learning methods ours is a truly end-to-end trainable approach that extracts high-quality building footprints & road networks without requiring complicated computationally intensive raster loss functions & intricate training pipelines. Upon evaluating Pix2Poly on several complex & challenging datasets we report that Pix2Poly outperforms state-of-the-art methods in several vector shape quality metrics while being an entirely explicit method. Our code is available at https://github.com/yeshwanth95/Pix2Poly.
Cite
Text
Adimoolam et al. "Pix2Poly: A Sequence Prediction Method for End-to-End Polygonal Building Footprint Extraction from Remote Sensing Imagery." Winter Conference on Applications of Computer Vision, 2025.Markdown
[Adimoolam et al. "Pix2Poly: A Sequence Prediction Method for End-to-End Polygonal Building Footprint Extraction from Remote Sensing Imagery." Winter Conference on Applications of Computer Vision, 2025.](https://mlanthology.org/wacv/2025/adimoolam2025wacv-pix2poly/)BibTeX
@inproceedings{adimoolam2025wacv-pix2poly,
title = {{Pix2Poly: A Sequence Prediction Method for End-to-End Polygonal Building Footprint Extraction from Remote Sensing Imagery}},
author = {Adimoolam, Yeshwanth Kumar and Poullis, Charalambos and Averkiou, Melinos},
booktitle = {Winter Conference on Applications of Computer Vision},
year = {2025},
pages = {8473-8482},
url = {https://mlanthology.org/wacv/2025/adimoolam2025wacv-pix2poly/}
}