Segment Any Events with Language

Abstract

Scene understanding with free-form language has been widely explored within diverse modalities such as images, point clouds, and LiDAR. However, related studies on event sensors are scarce or narrowly centered on semantic-level understanding. We introduce **SEAL**, the first Semantic-aware Segment Any Events framework that addresses Open-Vocabulary Event Instance Segmentation (OV-EIS). Given the visual prompt, our model presents a unified framework to support both event segmentation and open-vocabulary mask classification at multiple levels of granularity, including instance-level and part-level. To enable thorough evaluation on OV-EIS, we curate four benchmarks that cover *label granularity* from coarse to fine class configurations and *semantic granularity* from instance-level to part-level understanding. Extensive experiments show that our SEAL largely outperforms proposed baselines in terms of performance and inference speed with a parameter-efficient architecture. In the Appendix, we further present a simple variant of our SEAL achieving generic spatiotemporal OV-EIS that does not require any visual prompts from users in the inference. The code will be publicly available.

Cite

Text

Lee and Lee. "Segment Any Events with Language." International Conference on Learning Representations, 2026.

Markdown

[Lee and Lee. "Segment Any Events with Language." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/lee2026iclr-segment/)

BibTeX

@inproceedings{lee2026iclr-segment,
  title     = {{Segment Any Events with Language}},
  author    = {Lee, Seungjun and Lee, Gim Hee},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/lee2026iclr-segment/}
}