DeepFaceLab

9.5 DeepFaceLab

Created Date: 2025-07-10

Deepfake defense not only requires the research of detection but also requires the efforts of generation methods. However, current deepfake methods suffer the effects of obscure workflow and poor performance.

To solve this problem, we present DeepFaceLab, the current dominant deepfake framework for face-swapping. It provides the necessary tools as well as an easy-to-use way to conduct high-quality face-swapping. It also offers a flexible and loose coupling structure for people who need to strengthen their pipeline with other features without writing complicated boilerplate code.

We detail the principles that drive the implementation of DeepFaceLab and introduce its pipeline, through which every aspect of the pipeline can be modified painlessly by users to achieve their customization purpose.

It is noteworthy that DeepFaceLab could achieve cinema-quality results with high fidelity. We demonstrate the advantage of our system by comparing our approach with other face-swapping methods.

9.5.1 Introduction

Since deep learning has empowered the realm of computer vision in recent years, manipulating digital images, especially the manipulation of human portrait images, has improved rapidly and achieved photorealistic results in most cases. Face swapping is an eye-catching task in generating fake content by transferring a source face to the destination while maintaining the destination’s facial movements and expression deformations.

The fundamental motivation behind face manipulation techniques is Generative Adversarial Networks (GANs). More and more faces synthesized by StyleGAN, Style-GAN2 are becoming more and more realistic and entirely indistinguishable for the human vision system.

This paper introduces DeepFaceLab, an integrated open-source system with a clean-state design of the pipeline, achieving photorealistic face-swapping results without painful tuning. DFL has turned out to be very popular with the public. For instance, many artists create DFL-based videos and publish them on their YouTube channels. These videos made by DFL have more than 100 million hits.

The contributions of DeepFaceLab can be summarized as three-folds:

A state-of-the-art framework consists of a maturity pipeline is proposed, aiming to achieve photorealistic face-swapping results.
DeepFaceLab open-sourced the code in 2018 and always kept up to the progress in the computer vision area, making a positive contribution for defending deepfake, which has drawn broad attention in the open-source community and VFX areas.
A series of high-efficiency components and tools are introduced in DeepFaceLab to build better face-swapping videos.

9.5.2 Characteristics of DeepFaceLab

9.5.3 Pipeline

DeepFaceLab provides a set of workflow which form the flexible pipeline. In DeepFaceLab (DFL for short), we can abstract the pipeline into three phases: extraction, training, and conversion. These three parts are presented sequentially. Besides, it is noteworthy that DFL falls in a typical one-to-one face-swapping paradigm, which means thereare only two kinds of data: src and dst, the abbreviation for source and destination, are used in the following narrative.

9.5.3.1 Extraction

The extraction phase is the first phase in DFL, aiming to extract a face from src and dst data. This phase consists of many algorithms and processing parts, i.e., face detection, face alignment, and face segmentation. DFL provides many extraction modes (i.e, half-face, full-face, whole face), which represents the face coverage area of the extraction phase. Generally, we take full-face mode by default.

Face Detection

The first step in extraction phase is to find the target face in the given data: src and dst. DFL regards S3FD as its default face detector. S3FD can be replaced with other face detection algorithms painlessly, i.e RetinaFace.

Face Alignment

The second step is face alignment. After numerous experiments and failures, we realized that facial landmarks are the key to maintaining stability over time. We need to find an effective facial landmarks algorithm essential in producing an excellent successive footage shot and film.

DFL provides two canonical types of facial landmark extraction algorithms to solve this: (a) heatmap-based facial landmark algorithm 2DFAN (for faces with standard pose) and (b) PRNet with 3D face prior information (for faces with large Euler angle (yaw, pitch, roll), e.g., A face with a large yaw angle, means one side of the face is out of sight).

After facial landmarks are retrieved, we also provide an optional function with a configurable time step to smooth facial landmarks of consecutive frames in a single shot to ensure stability further.

Then we adopt a classical point pattern mapping and transformation method proposed by Umeyama to calculate a similarity transformation matrix used for face alignment.

As the method proposed by Umeyama et al. needs standard facial landmark templates in calculating similarity transformation matrix, DFL provides a canonical aligned facial landmark template. It is noteworthy that DFL could automatically predict the Euler angle by using the obtained facial landmarks.

Face Segmentation

After face alignment, a data folder with face of standard front/side-view (aligned src or aligned dst) is obtained.