Few-Shot Example-Driven Facial Modeling with Radiance Fields

Supplementary Material

Kacper Kania^1,2,†

Stephan J. Garbin³

Andrea Tagliasacchi^4,5,‡

Virginia Estellers³

Kwang Moo Yi²

Julien Valentin³

Tomasz Trzciński^1,6,7

Marek Kowalski³

Warsaw University of Technology¹

University of British Columbia²

Microsoft³

Simon Fraser University⁴

Google Brain⁵

IDEAS NCBR⁶

Tooploox⁷

^†Work done during an internship at Microsoft Research Cambridge.
^‡Work done at Simon Fraser University.

Abstract

Generating faithful visualizations of human faces requires capturing both coarse and fine-level details of the face geometry and appearance. Existing methods are either data-driven, requiring an extensive corpus of data not publicly accessible to the research community, or fail to capture fine details because they rely on geometric face models that cannot represent fine-grained details in texture with a mesh discretization and linear deformation designed to model only a coarse face geometry. We introduce a method that bridges this gap by drawing inspiration from traditional computer graphics techniques. Unseen expressions are modeled by blending appearance from a sparse set of extreme poses. This blending is performed by measuring local volumetric changes in those expressions and locally reproducing their appearance whenever a similar expression is performed at test time. We show that our method generalizes to unseen expressions, adding fine-grained effects on top of smooth volumetric deformations of a face, and demonstrate how it generalizes beyond faces.

Generated sequences

We present generated video sequences from models trained on datasets used in the paper. We directly compare visualizations generated from our method with baselines: VolTeMorph [1] trained on all the frames in the data and VolTeMorph trained on the single, most extreme expression. We start by showing extrapolation capabilities for all the methods, by modifying the expression vector $\mathbf{e}$ vector directly. We then show the renderings driven by an external expression data from Multiface dataset [2]. We end this page by showing how these approaches perform on the synthetic datasets. We additionally show results from baselines that base on the conditioning signal in the form of the expression vector. Our method generates the most realistic images while providing the controllability of the face.

Novel Expressions

Subject #2183941

InstantBlendFields

BlendFields

VolTeMorph_avg

VolTeMorph₁

Subject #5372021

InstantBlendFields

BlendFields

VolTeMorph_avg

VolTeMorph₁

Subject #6795937

InstantBlendFields

BlendFields

VolTeMorph_avg

VolTeMorph₁

Subject #7889059

InstantBlendFields

BlendFields

VolTeMorph_avg

VolTeMorph₁

Comparison of the visual convergence between InstantBlendFields and BlendFields.

InstantBlendFields

BlendFields

Avatar Generation

Subject #2183941

Ground Truth

InstantBlendFields

BlendFields

VolTeMorph_avg

VolTeMorph₁

Subject #5372021

Ground Truth

InstantBlendFields

BlendFields

VolTeMorph_avg

VolTeMorph₁

Subject #6795937

Ground Truth

InstantBlendFields

BlendFields

VolTeMorph_avg

VolTeMorph₁

Subject #7889059

Ground Truth

InstantBlendFields

BlendFields

VolTeMorph_avg

VolTeMorph₁

Free camera motion

As we work in the area of Neural Radiance Field, we can generate any camera movement which may be an application for future communication devices.

Synthetic Data

We show results from two sequences of a bending and twisting cylinders with rubber-like material properties. We generated these datasets using Houdini and Blender.

Ground Truth

InstantBlendFields

BlendFields

VolTeMorph_avg

VolTeMorph₁

Ground Truth

InstantBlendFields

BlendFields

VolTeMorph_avg

VolTeMorph₁

Bibtex

@inproceedings{kania2023blendfields,
    title     = {{BlendFields: Few-Shot Example-Driven Facial Modeling}},
    author    = {
        Kania, Kacper 
        and Garbin, Stephan J. 
        and Tagliasacchi, Andrea 
        and Estellers, Virginia 
        and Yi, Kwang Moo 
        and Valentin, Julien 
        and Kowalski, Marek 
        and Trzci{\'n}ski, Tomasz
        and Kowalski, Marek 
    },
    booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
    year      = {2023}
}

Acknowledgements

The work was partly supported by the National Sciences and Engineering Research Council of Canada (NSERC), the Digital Research Alliance of Canada, and Microsoft Mesh Labs. This research was funded by Microsoft Research through the EMEA PhD Scholarship Programme. We thank NVIDIA Corporation for granting us access to GPUs through NVIDIA's Academic Hardware Grants Program. This research was partially funded by National Science Centre, Poland (grant no 2020/39/B/ST6/01511 and 2022/45/B/ST6/02817).