Document Enhancement System Using Auto-Encoders

Abstract

The conversion of scanned documents to digital forms is performed using an Optical Character Recognition (OCR) software. This work focuses on improving the quality of scanned documents in order to improve the OCR output. We create an end-to-end document enhancement pipeline which takes in a set of noisy documents and produces clean ones. Deep neural network based denoising auto-encoders are trained to improve the OCR quality. We train a blind model that works on different noise levels of scanned text documents. Results are shown for blurring and watermark noise removal from noisy scanned documents.

Cite

Text

Gangeh et al. "Document Enhancement System Using Auto-Encoders." NeurIPS 2019 Workshops: Document_Intelligence, 2019.

Markdown

[Gangeh et al. "Document Enhancement System Using Auto-Encoders." NeurIPS 2019 Workshops: Document_Intelligence, 2019.](https://mlanthology.org/neuripsw/2019/gangeh2019neuripsw-document/)

BibTeX

@inproceedings{gangeh2019neuripsw-document,
  title     = {{Document Enhancement System Using Auto-Encoders}},
  author    = {Gangeh, Mehrdad J. and Tiyyagura, Sunil R. and Dasaratha, Sridhar V. and Motahari, Hamid and Duffy, Nigel P.},
  booktitle = {NeurIPS 2019 Workshops: Document_Intelligence},
  year      = {2019},
  url       = {https://mlanthology.org/neuripsw/2019/gangeh2019neuripsw-document/}
}