Integration of Large Vision Models in Driver Monitoring Systems: Compressing and Distilling for Real-Time Automotive Applications

Abstract

This study focuses on optimizing neural network architectures for real-time detection of driver facial bounding boxes. Initially, we trained the Florence-2 model, which demonstrated high accuracy but proved too large for real-time applications. To address this, we employed model distillation, using Florence-2 as a teacher to train a more compact DINOv2 model. Our aim was to maintain high detection accuracy while minimizing memory usage and inference time, making the solution viable for real-time implementation on GPU and NPU devices. We present a comparative analysis of model performance in terms of IoU scores, memory consumption and inference times.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Chatziloizos et al. "Integration of Large Vision Models in Driver Monitoring Systems: Compressing and Distilling for Real-Time Automotive Applications." NeurIPS 2024 Workshops: Compression, 2024.

Markdown

[Chatziloizos et al. "Integration of Large Vision Models in Driver Monitoring Systems: Compressing and Distilling for Real-Time Automotive Applications." NeurIPS 2024 Workshops: Compression, 2024.](https://mlanthology.org/neuripsw/2024/chatziloizos2024neuripsw-integration/)

BibTeX

@inproceedings{chatziloizos2024neuripsw-integration,
  title     = {{Integration of Large Vision Models in Driver Monitoring Systems: Compressing and Distilling for Real-Time Automotive Applications}},
  author    = {Chatziloizos, Georgios Markos and Ancora, Andrea and Comport, Andrew I. and Christian, Barat},
  booktitle = {NeurIPS 2024 Workshops: Compression},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/chatziloizos2024neuripsw-integration/}
}