$F^3Set$: Towards Analyzing Fast, Frequent, and Fine-Grained Events from Videos

Abstract

Analyzing Fast, Frequent, and Fine-grained ($F^3$) events presents a significant challenge in video analytics and multi-modal LLMs. Current methods struggle to identify events that satisfy all the $F^3$ criteria with high accuracy due to challenges such as motion blur and subtle visual discrepancies. To advance research in video understanding, we introduce $F^3Set$, a benchmark that consists of video datasets for precise $F^3$ event detection. Datasets in $F^3Set$ are characterized by their extensive scale and comprehensive detail, usually encompassing over 1,000 event types with precise timestamps and supporting multi-level granularity. Currently, $F^3Set$ contains several sports datasets, and this framework may be extended to other applications as well. We evaluated popular temporal action understanding methods on $F^3Set$, revealing substantial challenges for existing techniques. Additionally, we propose a new method, $F^3ED$, for $F^3$ event detections, achieving superior performance. The dataset, model, and benchmark code are available at https://github.com/F3Set/F3Set.

Cite

Text

Liu et al. "$F^3Set$: Towards Analyzing Fast, Frequent, and Fine-Grained Events from Videos." International Conference on Learning Representations, 2025.

Markdown

[Liu et al. "$F^3Set$: Towards Analyzing Fast, Frequent, and Fine-Grained Events from Videos." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/liu2025iclr-3set/)

BibTeX

@inproceedings{liu2025iclr-3set,
  title     = {{$F^3Set$: Towards Analyzing Fast, Frequent, and Fine-Grained Events from Videos}},
  author    = {Liu, Zhaoyu and Jiang, Kan and Ma, Murong and Hou, Zhe and Lin, Yun and Dong, Jin Song},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/liu2025iclr-3set/}
}