Abstract:
Photo-realistic editing of human facial expressions and head articulations is a long-standing topic in the computer graphics and computer vision community. Methods enabling such control have great potential in AR/VR applications where a 3D immersive experience is valuable, especially when this control extends to novel views of the scene in which the human subject is captured. Traditionally, 3D Morphable Face Models (3DMMs) are used to control the facial expressions and head-pose of a human head. However, the PCA based shape and expression spaces of 3DMMs lack the expressivity to model a large variety of face shapes and expressions. They cannot model essential elements of the human head, such as hair, skin details, and accessories, such as glasses, which are paramount for realistic reanimation. Further, 3DMMs only model with head region and lack the ability to model the full scene realistically. In this proposal, we present a set of methods that enables facial reanimation, starting from editing expressions in still face images to creating fully controllable Neural 3D portraits with control over facial expressions, head pose, and viewing direction of the scene using only casually captured monocular videos from a smartphone.
First, we propose a method for editing facial expressions in close-to-frontal facial images through unsupervised disentangling of expression-induced deformations and texture changes. Next, we extend facial expression editing to human subjects in 3D scenes. We represent the scene and the subject in it using a semantically guided neural field, thus enabling control not only over the subjects' facial expressions but also the viewing direction of the scene they're in. We then present a method that learns, in an unsupervised manner, to deform static 3D neural fields using facial expression and head-pose dependent deformations; enabling control over facial expressions and head-pose of the subject along with the viewing direction of the 3D scene they're in. Next, we propose a method that makes the learning of the aforementioned deformation field robust to strong illumination effects, which adversely impact the registration of the deformation. We then propose an extension of this unsupervised deformation model to 3D Gaussian splatting by constraining it using a 3D morphable model, resulting in a rendering speed of almost 18FPS. A 100x improvement over prior work. Finally, we propose creating a relightable and reanimatable Neural 3D portrait by using the dynamics of the human head in the monocular training video to sample illumination-dependent effects on it, allowing us to disentangle the appearance of the face into relightable assets and the environment light.
Dates
Tuesday, June 25, 2024 - 01:00pm to Tuesday, June 25, 2024 - 03:00pm
Location
NCS 220
Event Description
Event Title
Ph.D. Proposal Defense: Shahrukh Athar,