DeePhy is a novel DeepFake Phylogeny dataset consisting of 5040 DeepFake videos generated using three different generation techniques. It is one of the first datasets which incorporates the concept of Deepfake Phylogeny which refers to the idea of generation of DeepFakes using multiple generation techniques in a sequential manner.
The dataset can be used for the tasks of (i) DeepFake Detection ,(ii) Model Attribution of DeepFakes and (iii) Prediction of the sequential order of DeepFake techniques employed to create phylogenetic deepfakes. It will facilitate advancements in real-life scenarios of plagiarism detection, forgery detection, and reverse engineering of deepfakes.
The dataset consists of 100 real videos and 5040 DeepFake videos generated using several iterations of face swapping. There are 840 videos of one-time swapped deepfakes (Iteration 1 deepfakes), 2520 videos of two-times swapped deepfakes (Iteration 2 deepfakes) and 1680 videos of three-times swapped deepfakes (Iteration 3 deepfakes). The total raw size of the dataset is approximately 30 GB. The average duration of the videos is approximately 20 seconds and all videos are in 720p resolution with 25 frames per second. The videos are in MPEG4.0 format.
Deephy dataset (CRC32: 4977e5c7, MD5: 9c47fe0b5ee291392ed43edf69d245e3)
The real videos of subjects are taken from Youtube which is a publicly accessible platform with diverse distribution in gender, orientation, skin tone, size of face (in pixels), lighting conditions, background and presence of occlusion. DeepPhy dataset is annotated with 10 attributes - Gender, Age, Skin Color, 5oClockshadow, Beard, Moustache, Spectacles, Shades, Mic, Cap/Turban/Hijab/Scarf and Hair Occlusion. Gender is annotated as “Male” or “Female”. Age is divided into three categories, people with apparent age between 18 (inclusive) to 30 belong to “Young Adult”, with apparent age between 30 (inclusive) to 55 belong to “Adult” and with apparent age greater than 55 (inclusive) belong to “Old”. The Skin Color annotations vary from 1 to 6 which corresponds to the 6 skin color types in the Fitzpatrick scale. All the other attributes are binary and their values can either be “Y” or “N” which represents presence of the attribute and absence of the attribute, respectively.
The file naming of the videos are organized so that the source and target videos can be identified.
Iteration 1 →
x_y.mp4. Here x is the target video and y is the source video. It means that the background will be of video x and the face will be of video y.
Iteration 2 →
x_y_z.mp4. Here x is the target video and y, z are the source videos. It means that the background will be of video x and the face will be of video y over which another face of video z will be pasted.
Iteration 3 →
x_y_z_w.mp4. Here x is the target video and y, z, w are the source videos. It means that the background will be of video x and the faces will be pasted in the order y - z - w.
License Agreement + Citation