Generating faithful visualizations of human faces requires capturing both coarse and fine-level details of the face geometry and appearance. Existing methods are either data-driven, requiring an extensive corpus of data not publicly accessible to the research community, or fail to capture fine details because they rely on geometric face models that cannot represent fine-grained details in texture with a mesh discretization and linear deformation designed to model only a coarse face geometry. We introduce a method that bridges this gap by drawing inspiration from traditional computer graphics techniques. Unseen expressions are modeled by blending appearance from a sparse set of extreme poses. This blending is performed by measuring local volumetric changes in those expressions and locally reproducing their appearance whenever a similar expression is performed at test time. We show that our method generalizes to unseen expressions, adding fine-grained effects on top of smooth volumetric deformations of a face, and demonstrate how it generalizes beyond faces.
We present generated video sequences from models trained on datasets used in the paper. We directly compare visualizations generated from our method with baselines: VolTeMorph [1] trained on all the frames in the data and VolTeMorph trained on the single, most extreme expression. We start by showing extrapolation capabilities for all the methods, by modifying the expression vector $\mathbf{e}$ vector directly. We then show the renderings driven by an external expression data from Multiface dataset [2]. We end this page by showing how these approaches perform on the synthetic datasets. We additionally show results from baselines that base on the conditioning signal in the form of the expression vector. Our method generates the most realistic images while providing the controllability of the face.
As we work in the area of Neural Radiance Field, we can generate any camera movement which may be an application for future communication devices.
We show results from two sequences of a bending and twisting cylinders with rubber-like material properties. We generated these datasets using Houdini and Blender.
@inproceedings{kania2023blendfields, title = {{BlendFields: Few-Shot Example-Driven Facial Modeling}}, author = { Kania, Kacper and Garbin, Stephan J. and Tagliasacchi, Andrea and Estellers, Virginia and Yi, Kwang Moo and Valentin, Julien and Kowalski, Marek and Trzci{\'n}ski, Tomasz and Kowalski, Marek }, booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, year = {2023} }
The work was partly supported by the National Sciences and Engineering Research Council of Canada (NSERC), the Digital Research Alliance of Canada, and Microsoft Mesh Labs. This research was funded by Microsoft Research through the EMEA PhD Scholarship Programme. We thank NVIDIA Corporation for granting us access to GPUs through NVIDIA's Academic Hardware Grants Program. This research was partially funded by National Science Centre, Poland (grant no 2020/39/B/ST6/01511 and 2022/45/B/ST6/02817).