Tsolakis, Georgios
1 publications
NeurIPS
2023
WordScape: A Pipeline to Extract Multilingual, Visually Rich Documents with Layout Annotations from Web Crawl Data
Maurice Weber, Carlo Siebenschuh, Rory Butler, Anton Alexandrov, Valdemar Thanner, Georgios Tsolakis, Haris Jabbar, Ian Foster, Bo Li, Rick Stevens, Ce Zhang