Abstract: Mobile device-captured document images often contain geometric and photometric artifacts due to the physical shape of the paper, camera pose, or complex lighting conditions. Therefore, unlike images captured with high-fidelity flatbed scanners, mobile-captured documents are ill-suited for digitization tasks such as Optical Character Recognition (OCR), Table detection, or Key-value extraction. Document unwarping and illumination correction aim to rectify the geometric and photometric artifacts to obtain flatbed scanner-like document images. Recently, deep-learning-based models achieved performance breakthroughs for these tasks. They rely on large synthetic document datasets to learn and extract valuable features implicitly. However, these models ignore the physical aspects of paper folding in 3D. Therefore, they often do not generalize well and produce inconsistent results. In this thesis, I use physical cues and constraints in deep learning-based document unwarping and illumination correction models. The additional physical assumptions enable the network to closely adhere to the physics of document folding and image formation, therefore achieving better generalizability. First, I present two document unwarping models that use the 3D shape of the document as an intermediate representation in a global and local manner. Second, I present the first unwarping model that obviates the need for a large-scale training dataset and utilizes multi-view images to learn an unwarping map using a differentiable rendering framework. This new framework introduces isometric constraints corresponding to the physical properties of paper. Interestingly, this model also generalizes the unwarping task for other everyday objects, e.g., fabric, soda cans, etc. Third, I present two models to remove illumination artifacts from documents. We assume a Lambertian illumination model and learn to disentangle a document image's illumination color and shading. Finally, I extend the multi-view image-based unwarping with an explicit illumination model, allowing unwarping and illumination correction in a single framework. This joint framework significantly improves the results due to additional physical constraints from surface normals and light direction modeling. The proposed framework can produce high-quality, shading-removed, scanner-like results and improve quantitative results by a significant margin.
Dates
Wednesday, November 02, 2022 - 11:30am to Wednesday, November 02, 2022 - 01:30pm
Location
New Computer Science, Room 220
Event Description
Event Title
Ph.D. Thesis Defense: Sagnik Das, 'From In-the-Wild Images of Documents to High Quality Scans'