Ph.D. Proposal Defense: Sagnik Das, 'From In-the-Wild Images of Documents to High Quality Scans'

Dates

Tuesday, May 24, 2022 - 02:00pm to Tuesday, May 24, 2022 - 03:30pm

Location

NCS 220

Event Description

Abstract: Mobile device captured document images often contain geometric and photometric artifacts due to the physical shape of the paper, camera pose, or complex lighting conditions. Therefore, unlike images captured with high fidelity using flatbed scanners, mobile captured documents are ill-suited for digitization tasks such as Optical Character Recognition (OCR), Table detection, or Key-value extraction. Document unwarping and illumination correction aim to rectify the geometric and photometric artifacts to obtain flatbed scanner-like document images. Recently, deep-learning-based models achieved breakthroughs in performance for these tasks. They rely on large synthetic document datasets to learn and extract valuable features implicitly. However, these models ignore the physical aspects of paper folding in 3D. Therefore, they often lack generalization and produce inconsistent results. I propose to use physical cues and constraints in deep learning-based document unwarping and illumination correction models. The additional physical assumptions enable the network to closely adhere to the physics of document folding and image formation, therefore achieving better generalizability. In this thesis, first, I present two document unwarping models that use the 3D shape of the document as an intermediate representation in a global and local manner. Second, I present the first unwarping model that obviates the need for a large-scale training dataset and utilizes multi-view images to learn an unwarping map using a differentiable rendering framework. This new framework introduces isometric constraints corresponding to the physical properties of a paper. Interestingly, this model also generalizes the unwarping task for other everyday warped objects, e.g., fabric, soda cans, etc. Third, I present two models to remove illumination artifacts from documents. We assume a Lambertian illumination model and learn to disentangle a document image's illumination color and shading. Our system can produce high-quality, shading-removed, scanner-like results and improve the OCR by a significant margin. Finally, I propose extending the multi-view image-based unwarping with an explicit illumination model, allowing unwarping and illumination correction in a single framework.

Event Title

Ph.D. Proposal Defense: Sagnik Das, 'From In-the-Wild Images of Documents to High Quality Scans'