EV-LayerSegNet: Self-Supervised Motion Segmentation Using Event Cameras

Abstract

Event cameras are novel bio-inspired sensors that capture motion dynamics with much higher temporal resolution than traditional cameras, since pixels react asynchronously to brightness changes. They are therefore better suited for tasks involving motion such as motion segmentation. However, training event-based networks still represents a difficult challenge, as obtaining ground truth is very expensive, error-prone and limited in frequency. In this article, we introduce EV-LayerSegNet, a self-supervised CNN for event-based motion segmentation. Inspired by a layered representation of the scene dynamics, we show that it is possible to learn affine optical flow and segmentation masks separately, and use them to deblur the input events. The deblurring quality is then measured and used as self-supervised learning loss. We train and test the network on a simulated dataset with only affine motion, achieving IoU and detection rate up to 71% and 87% respectively.

Cite

Text

Farah et al. "EV-LayerSegNet: Self-Supervised Motion Segmentation Using Event Cameras." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.

Markdown

[Farah et al. "EV-LayerSegNet: Self-Supervised Motion Segmentation Using Event Cameras." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.](https://mlanthology.org/cvprw/2025/farah2025cvprw-evlayersegnet/)

BibTeX

@inproceedings{farah2025cvprw-evlayersegnet,
  title     = {{EV-LayerSegNet: Self-Supervised Motion Segmentation Using Event Cameras}},
  author    = {Farah, Youssef and Paredes-Vallés, Federico and de Croon, Guido and Humais, Muhammad Ahmed and Sajwani, Hussain M. and Zweiri, Yahya H.},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2025},
  pages     = {5126-5135},
  url       = {https://mlanthology.org/cvprw/2025/farah2025cvprw-evlayersegnet/}
}