DeePhy Database

DeePhy database

DeePhy is a novel DeepFake Phylogeny dataset consisting of 5040 DeepFake videos generated using three different generation techniques. It is one of the first datasets which incorporates the concept of Deepfake Phylogeny which refers to the idea of generation of DeepFakes using multiple generation techniques in a sequential manner.

The dataset can be used for the tasks of (i) DeepFake Detection ,(ii) Model Attribution of DeepFakes and (iii) Prediction of the sequential order of DeepFake techniques employed to create phylogenetic deepfakes. It will facilitate advancements in real-life scenarios of plagiarism detection, forgery detection, and reverse engineering of deepfakes.

Dataset Statistics/Size/Format

The dataset consists of 100 real videos and 5040 DeepFake videos generated using several iterations of face swapping. There are 840 videos of one-time swapped deepfakes (Iteration 1 deepfakes), 2520 videos of two-times swapped deepfakes (Iteration 2 deepfakes) and 1680 videos of three-times swapped deepfakes (Iteration 3 deepfakes). The total raw size of the dataset is approximately 30 GB. The average duration of the videos is approximately 20 seconds and all videos are in 720p resolution with 25 frames per second. The videos are in MPEG4.0 format.
Deephy dataset (CRC32: 4977e5c7, MD5: 9c47fe0b5ee291392ed43edf69d245e3)

Dataset Annotations

The real videos of subjects are taken from Youtube which is a publicly accessible platform with diverse distribution in gender, orientation, skin tone, size of face (in pixels), lighting conditions, background and presence of occlusion. DeepPhy dataset is annotated with 10 attributes - Gender, Age, Skin Color, 5oClockshadow, Beard, Moustache, Spectacles, Shades, Mic, Cap/Turban/Hijab/Scarf and Hair Occlusion. Gender is annotated as “Male” or “Female”. Age is divided into three categories, people with apparent age between 18 (inclusive) to 30 belong to “Young Adult”, with apparent age between 30 (inclusive) to 55 belong to “Adult” and with apparent age greater than 55 (inclusive) belong to “Old”. The Skin Color annotations vary from 1 to 6 which corresponds to the 6 skin color types in the Fitzpatrick scale. All the other attributes are binary and their values can either be “Y” or “N” which represents presence of the attribute and absence of the attribute, respectively.

Dataset Benchmarking

Dataset Samples

Dataset Directory Structure and File Naming

– fake
– iteration 1
- – faceshifter
  - – female
  - – male
- – faceswap
- – fsgan
– iteration 2
- – faceshifter_faceshifter
- – faceshifter_faceswap
- – faceshifter_fsgan
- – faceswap_faceshifter
- – faceswap_faceswap
- – faceswap_fsgan
- – fsgan_faceshifter
- – fsgan_faceswap
- – fsgan_fsgan
– iteration 3
- – faceshifter_faceswap_fsgan
- – faceshifter_fsgan_faceswap
- – faceswap_faceshifter_fsgan
- – faceswap_fsgan_faceshifter
- – fsgan_faceshifter_faceswap
- – fsgan_faceswap_faceshifter
– real
- – female
- – male

The file naming of the videos are organized so that the source and target videos can be identified.

Iteration 1 →
x_y.mp4. Here x is the target video and y is the source video. It means that the background will be of video x and the face will be of video y.

Iteration 2 →
x_y_z.mp4. Here x is the target video and y, z are the source videos. It means that the background will be of video x and the face will be of video y over which another face of video z will be pasted.

Iteration 3 →
x_y_z_w.mp4. Here x is the target video and y, z, w are the source videos. It means that the background will be of video x and the faces will be pasted in the order y - z - w.

License Agreement + Citation

To obtain the password for the compressed file, email the duly filled license agreement to databases@iab-rubric.org with the subject line "License agreement for DeePhy”
NOTE: The license agreement has to be signed by someone having the legal authority to sign on behalf of the institute, such as the head of the institution or registrar. If a license agreement is signed by someone else, it will not be processed further.

This database is available only for research and educational purpose and not for any commercial use. If you use the database in any publications or reports, you must refer to the following paper:
K. Narayan, H. Agarwal, K.Thakral, S. Mittal, M. Vatsa, and R. Singh, DeePhy: On DeepFake Phylogeny, International Joint Conference on Biometrics, 2022 (IJCB - 2022).