Semantic-Drive: Trustworthy and Efficient Long-Tail Data Curation via Open-Vocabulary Grounding and Neuro-Symbolic VLM Consensus

Abstract

The development of Autonomous Vehicles (AVs) is currently hampered by a scarcity of long-tail training data. While fleets collect petabytes of video logs, identifying rare safety-critical events, specifically scenarios like erratic jaywalking or complex construction diversions, remains a manual process that is often cost-prohibitive. Existing automated solutions rely either on coarse metadata search, which lacks semantic precision, or on cloud-based Vision-Language Models (VLMs), which introduce privacy concerns and computational overhead. In this work, we introduce Semantic-Drive, a local-first, neuro-symbolic framework designed for verifiable semantic data mining. Our approach decouples perception into two distinct stages: (1) Symbolic Grounding via a real-time open-vocabulary detector (YOLOE) to anchor attention, and (2) Cognitive Analysis, where a Reasoning VLM performs forensic scene analysis. To reduce hallucinations and reliability issues common in generative models, we implement a "System 2" inference-time alignment strategy that utilizes a multi-model "Judge-Scout" consensus mechanism. When benchmarked on the nuScenes dataset against the Waymo Open Dataset (WOD-E2E) taxonomy, it was observed that Semantic-Drive achieves a recall of 0.966 on safety-critical scenarios (vs. 0.331 for OWL-v2 and 0.271 for Grounding DINO). Notably, the system reduces risk assessment error by 40% compared to single-model baselines. The entire pipeline runs on consumer hardware (NVIDIA RTX 3090), offering an accessible and privacy-preserving alternative to cloud-native architectures.

Cite

Text

Guillen-Perez. "Semantic-Drive: Trustworthy and Efficient Long-Tail Data Curation via Open-Vocabulary Grounding and Neuro-Symbolic VLM Consensus." Transactions on Machine Learning Research, 2026.

Markdown

[Guillen-Perez. "Semantic-Drive: Trustworthy and Efficient Long-Tail Data Curation via Open-Vocabulary Grounding and Neuro-Symbolic VLM Consensus." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/guillenperez2026tmlr-semanticdrive/)

BibTeX

@article{guillenperez2026tmlr-semanticdrive,
  title     = {{Semantic-Drive: Trustworthy and Efficient Long-Tail Data Curation via Open-Vocabulary Grounding and Neuro-Symbolic VLM Consensus}},
  author    = {Guillen-Perez, Antonio},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/guillenperez2026tmlr-semanticdrive/}
}