CNN-Transformer with Absolute Positional Encoding Optimized for Low-Dimensional Inputs: Applied to Estimate Sliding Drop Width

Abstract

High-speed video recordings are crucial for investigating drop dynamics and their interactions with surfaces. Measuring the width of sliding drops, a key parameter linked to frictional forces, requires additional equipment like cameras or mirrors, complicating experimental setups and limiting observable areas. This study introduces a novel method that simplifies the measurement process by employing artificial neural networks to estimate millimeter-scale drop width directly from side-view video data. Our approach processes raw video footage to dynamically identify features most indicative of drop width. By treating drop behavior as an extrinsic time-series problem, our model effectively captures temporal dependencies in video sequences. We propose a VGG8-inspired architecture optimized for small and low information density video datasets. This architecture is combined with our novel position invariant video processing methodology that efficiently removes non-essential regions, reducing computation time by 84%. We further integrate ConvTran, a state-of-the-art time-series classification model, with an enhanced Absolute Position Encoding, improving the encoding’s dot-product and lowering drop width estimation errors. Our novel neural network architecture achieved a root mean square error of 48 $\upmu $ μ m (1.7 % relative error), where each pixel corresponds to approximately 44 $\upmu $ μ m. Code and data are open-sourced at: https://github.com/shumaly/position_invariant_cnn_transformer .

Cite

Text

Shumaly et al. "CNN-Transformer with Absolute Positional Encoding Optimized for Low-Dimensional Inputs: Applied to Estimate Sliding Drop Width." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025. doi:10.1007/978-3-032-06118-8_1

Markdown

[Shumaly et al. "CNN-Transformer with Absolute Positional Encoding Optimized for Low-Dimensional Inputs: Applied to Estimate Sliding Drop Width." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025.](https://mlanthology.org/ecmlpkdd/2025/shumaly2025ecmlpkdd-cnntransformer/) doi:10.1007/978-3-032-06118-8_1

BibTeX

@inproceedings{shumaly2025ecmlpkdd-cnntransformer,
  title     = {{CNN-Transformer with Absolute Positional Encoding Optimized for Low-Dimensional Inputs: Applied to Estimate Sliding Drop Width}},
  author    = {Shumaly, Sajjad and Darvish, Fahimeh and Salehi, Mahsa and Foumani, Navid Mohammadi and Kukharenko, Oleksandra and Butt, Hans-Jürgen and Schwanecke, Ulrich and Berger, Rüdiger},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2025},
  pages     = {3-21},
  doi       = {10.1007/978-3-032-06118-8_1},
  url       = {https://mlanthology.org/ecmlpkdd/2025/shumaly2025ecmlpkdd-cnntransformer/}
}