How Many Events Make an Object? Improving Single-Frame Object Detection on the 1 Mpx Dataset

Abstract

Event cameras are promising novel vision sensors with higher dynamic range and higher temporal resolution compared to frame-based cameras. In contrast to images, single-frame detectors without memory perform poorly on event data. We analyze the distribution of event counts in the 2D bounding boxes in the 1 Mpx Dataset to find that the distribution is skewed towards few events, rendering it impossible to detect objects based only on current information. Memory layers like LSTM can alleviate this problem, but increase training time and inference costs. To bring the advantages of single-frame detectors to event camera data, we propose a data filtering mechanism and a novel bounding box memory. The filtering mechanism excludes labels with low event count during training, which improves performance on unfiltered test data. The bounding box memory memorizes bounding boxes until an event threshold is reached, which improves performance, has a low memory and latency footprint, and can be integrated into any object detector without retraining. Improvements are shown on a simulated dataset based on moving MNIST digits, as well as the 1 Mpx Dataset, the largest event camera object detection dataset to date, illustrating that our method scales to large datasets and works in a complex real-world setting.

Cite

Text

Kugele et al. "How Many Events Make an Object? Improving Single-Frame Object Detection on the 1 Mpx Dataset." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023. doi:10.1109/CVPRW59228.2023.00406

Markdown

[Kugele et al. "How Many Events Make an Object? Improving Single-Frame Object Detection on the 1 Mpx Dataset." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023.](https://mlanthology.org/cvprw/2023/kugele2023cvprw-many/) doi:10.1109/CVPRW59228.2023.00406

BibTeX

@inproceedings{kugele2023cvprw-many,
  title     = {{How Many Events Make an Object? Improving Single-Frame Object Detection on the 1 Mpx Dataset}},
  author    = {Kugele, Alexander and Pfeil, Thomas and Pfeiffer, Michael and Chicca, Elisabetta},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2023},
  pages     = {3913-3922},
  doi       = {10.1109/CVPRW59228.2023.00406},
  url       = {https://mlanthology.org/cvprw/2023/kugele2023cvprw-many/}
}