Home Datasets Synthetic DeepFake Videos Dataset

Commercial

DeepFake Videos Dataset

The deepfake dataset contains real and AI-generated deepfake videos, featuring diverse subjects with detailed metadata on age, gender, and ethnicity to help train powerful deepfake detectors

files

5,000
people

5,000

Facial Recognition
Computer Vision
Machine learning
Data generation
Security

The deepfake dataset contains real and AI-generated deepfake videos, featuring diverse subjects with detailed metadata on age, gender, and ethnicity to help train powerful deepfake detectors

Facial Recognition
Computer Vision
Machine learning
Data generation
Security

files

5,000
people

5,000

Dataset Info

Characteristic	Data
Description	Real video of people with AI-generated faces, where individuals turn their heads in different directions
Data types	Video
Tasks	Facial recognition, Computer Vision
Total number of files	5,000
Number of people	5,000
Video generation sites	aisaver.io, faceswapvideo.ai, magichour.ai
Labeling	Metadata (age, gender, ethnicity)
Gender	Male, Female
Ethnicity	Asian (30%), African (70%)
Age	Min = 18, max = 80, mean = 45

Statistics

: Distribution by age

: Duration of the video duration

: Distribution by gender

: Distribution by ethnicity

Technical
Characteristics

Characteristic	Data
Video extension	mp4, MOV
Video Resolutions	1920 x 1080p, 480 x 360p, 1280 x 720p, 720 x 480p, 640 x 480p, 1920 x 920p
Video duration	Mean = 9, median = 9, min = 2, max = 34
Frames per second	Mean = 26.6
Devices	iPhone 13 (30%), Google Pixel (70%)

Source and collection methodology. Data was collected by overlaying generated faces onto real videos using the following websites: aisaver.io, faceswapvideo.ai, and magichour.ai.

Dataset Use Cases

Cybersecurity & Digital Forensics
Detecting Deepfake Content and Fake Videos

Deepfake Videos Dataset provides critical training data for developing deepfake detectors and detection algorithms. Containing both real videos and fake videos generated using advanced deepfake technology, it allows analysts to train detection systems that identify synthetic media and protect against identity fraud, misinformation, and other digital threats.
AI & Machine Learning Research
Training Models for Deepfake Detection

This deep fake detection dataset is widely used in machine learning and deep learning projects. The dataset consists of thousands of video clips and face images, including manually labelled examples. Models trained on this data achieve better accuracy in spotting AI-generated videos and distinguishing between real and synthetic video datasets.
Media & Journalism
Verifying Video Content Authenticity

News organizations use such datasets to enhance video detection tools that verify YouTube videos, interviews, and shared clips. By training recognition systems on datasets containing both source videos and generated faces, journalists can validate footage, identify manipulated content, and strengthen trust in digital reporting.
Technology & App Development
Building Safer Recognition and Verification Systems

Tech companies rely on the synthetic video dataset to test facial recognition and object detection systems against deepfake content. The dataset comprising high-quality video frames, fake images, and synthetic data helps in creating more reliable generative AI detection methods. This improves authentication solutions and delivers better results in protecting digital platforms.

FAQs

How large is DeepFake Videos Dataset compared to other available datasets?

With 5,000 video clips and diverse demographic coverage, this collection is one of the largest datasets of its kind. Its scale allows models trained on it to achieve higher accuracy in deepfake content detection and facial recognition tasks.

What devices and resolutions are represented in the dataset?

Videos were recorded on iPhone 13 devices (30%) and Google Pixel devices (70%), then processed with deepfake technology. The dataset covers multiple resolutions, including 1080p, 720p, and 480p, supporting a wide range of video detection methods.

How was the data collected?

The dataset was built by generating fake faces using AI models and overlaying them on real videos (using the following tools: aisaver.io, faceswapvideo.ai, and magichour.ai).

Is it possible to request a custom deepfake dataset?

Custom datasets can be created on request, allowing you to specify generation methods, annotation formats, or target demographics. This flexibility ensures better results for applications such as face recognition, synthetic video detection, or generative AI model training.

How are Unidata datasets licensed?

Unidata datasets follow a dual-licensing model. Free samples are available for testing and evaluation, while full datasets, including the Deepfake Videos Dataset, can be accessed exclusively through purchase.

How are Unidata datasets stored?

Unidata securely stores datasets on AWS cloud infrastructure, ensuring high availability and scalability. Our storage practices comply with ISO 27001 and ISO 27701 standards, guaranteeing internationally recognized information security and privacy management for sensitive data.

How long does it take to receive the dataset?

Once you submit a request, our team will contact you to confirm details and finalize documentation. After signing and payment, the dataset will be delivered within 3–10 business days.

Is this a real-world dataset or synthetic data?

The Deepfake Videos Dataset is a synthetic video dataset created by combining real-world source videos with AI-generated faces. This hybrid approach provides realistic training material for developing deepfake detection methods, recognition models, and generative AI research.

Still have questions about using Unidata datasets?

Unidata Cases

Digital Tree Passport Annotation for Forest Mapping

Forestry Monitoring & GIS
200,000 trees, 10 species classes
2 months

Learn more

License Plate Annotation for Vehicle Recognition System

100,000 images with detailed license plate markup (bounding boxes, digits, regional symbols)
2 weeks

Learn more

Sentiment Annotation for Brand Monitoring

Marketing & Consumer Insights
12,000 text samples, 3 sentiment classes (positive, negative, neutral)
3 weeks

Learn more

Surveillance Video Annotation for Entrance Monitoring

Surveillance & Security
90 minutes of video from three cameras, approximately 50-60 thousand frames
2 week

Learn more

Similar Datasets

Commercial
- Computer Vision
- Machine Learning
- Image Processing
- Security
- Anti-Spoofing
Multi-Material Fingerprint Spoofing Dataset

Multi-Material Fingerprint Spoofing Dataset contains 4,000+ fingerprint images from 100 individuals, captured with a ZKTeco ZK9500 optical scanner and including real fingerprints and spoofing attacks created with alginate, plasticine, and silicone materials. The fingerprint dataset includes metadata (gender, age, finger, hand, device) and supports biometric security research, presentation attack detection, spoof detection, and fingerprint recognition model training.

100 People
4000+ Photos
Commercial
- Computer Vision
- Machine Learning
- Image Processing
- Security
- Anti-Spoofing
Biometric Fingerprint Spoofing Dataset

Biometric Fingerprint Spoofing Dataset contains 5,000+ high-quality fingerprint images capturing real fingerprints and multiple spoofing fingerprint attack types, including print and replay scenarios. Designed for spoofing detection and liveness detection tasks, the fingerprint dataset provides labeled biometric data from different devices and fingers to train and evaluate biometric security and fingerprint recognition systems.

100 People
5000+ Photos
Commercial
- Facial Recognition
- Liveness Detection
- Security
- Anti-spoofing
- Computer Vision
Anti-Spoofing Replay PC Videos Dataset

This is a high-quality replay attack dataset containing 4,714 PC-recorded video clips of real faces, designed for training and evaluating face recognition and liveness detection systems. This anti-spoofing videos dataset includes diverse attack scenarios, technical metadata (age, gender, ethnicity), and MP4/MOV formats to support spoofing detection, biometric security, and computer vision model development.

4,714 Videos
4,714 People
Commercial
- Facial Recognition
- Liveness Detection
- Security
- Anti-spoofing
- Computer Vision
Anti-Spoofing Replay Phone Videos Dataset

This anti-spoofing dataset contains over 38,000 live facial video recordings captured on mobile devices to support replay attack detection and biometric anti-spoofing research. With paired video sets, MP4/MOV formats, and rich metadata such as age, gender, and ethnicity, it provides reliable training data for face antispoofing, liveness detection, and secure biometric authentication systems.

38,029 Videos
20,018 Sets
Commercial
- Speech Analysis
- ASR
- Machine learning
- Data generation
- Audio Processing
Real vs Fake Human Voice – Deepfake Audio Dataset

Real vs Fake Human Voice – Deepfake Audio Dataset contains 5,000 audio files featuring both genuine human recordings and AI-generated voice samples. Each set includes four speakers with multiple clips across M4A and MP3 formats. The dataset supports research in deepfake detection, generated speech analysis, and real vs fake human voice recognition tasks.

5,000 Audio
Commercial
- Facial Recognition
- Security
- Anti-spoofing
- Computer Vision
- Machine Learning
Kids Anti-Spoofing Dataset

Kids Anti-Spoofing Dataset provides 6,000 high-quality facial images of children aged 7–15 for face anti-spoofing and liveness detection tasks. This child safety dataset supports research in biometric systems, helping improve facial recognition accuracy, detect spoofing attacks, and build safer AI models for protecting kids in digital and identification environments.

6 000 Images
300 people
Commercial
- Image Processing
- Machine Learning
- Hand Recognition
- Forensics
- Computer Vision
Open Palm Hand Images Dataset

This high-quality open palm dataset includes 500,000 annotated images collected from 50,000 people, with each set containing six palm photos, two printed-hand images, and two replay videos. Designed for hand recognition and computer vision research, it provides detailed metadata - age, gender, ethnicity, profession, device type, dominant hand, and jewelry status.

500,000 Images
50,000 People
Commercial
- PII
- Data generation
- Security
- Anti-spoofing
- Computer Vision
Synthetic Printed Turkish Passports Dataset

It is a synthetic Turkish passports dataset containing 5,000 high-quality, AI-generated images. Labeled with detailed metadata - including passport ID, class, gender, and lighting - this dataset supports PII extraction, identity verification, and biometric recognition system training while maintaining strict data protection standards.

5000 Images
Commercial
- Facial Recognition
- iBeta
- Liveness Detection
- Security
- Anti-spoofing
- Computer Vision
iBeta Kids Dataset

iBeta Kids Dataset is a child safety dataset featuring over 46,000 short video samples of children across different age groups, recorded under varied lighting, devices, and conditions. It includes four main attack types - Real Person, 2D Mask, 3D Mask, and Replay - helping develop biometric systems that detect spoofing and ensure safe, accurate child identification.

45 600 Videos
60 People
Commercial
- PII
- Data generation
- Security
- Anti-spoofing
- Computer Vision
Synthetic Printed German Passports Dataset

This German passport dataset provides 5,000 AI-generated synthetic passport images, engineered for training and benchmarking ML models in document analysis and PII extraction. It features high-resolution JPG samples with controlled variations across 3 angles, 4 lighting conditions, and 4 backgrounds, each annotated with detailed metadata including passport ID, gender, and age group for robust model development.

5 000 Images

Why Companies Trust Unidata's Datasets

Share your project requirements, we handle the rest. Every service is tailored, executed, and compliance-ready, so you can focus on strategy and growth, not operations.

70+ Datasets

Finance, IT, E-commerce, Retail, Healthcare and 14+ Industries
Multiple supported formats

Unique & Diverse Data

Diversity in ethnicity, age, country, gender, and more
Exclusively collected data, not available from open sources

Custom Dataset Solutions

No manual collection needed from your side; we handle everything
Up to 70% cheaper than in-house

100% Legal, Secure & Compliant

Curated and legally sourced
AWS ISO 27001/27701

Smooth Collaboration & Fast Delivery

87% of datasets delivered in 3–10 days
Dedicated PM, Europe-timezone communication

Need Proof?

See the results we've delivered for leading tech companies and startups.

Explore datasets

What our clients are saying

UniData

4 3 Reviews

Paul 2025-02-21

Very Positive Experience!

The team was very responsive when requesting a specific dataset, and was able to work with us on what data we specifically needed and custom pricing for our use case. Overall a great experience, and would recommend them to others!

Thorsten 2025-01-09

Very good experience

We got in touch with UniData to buy several datasets from them. Communication was very cooperative, quick, and friendly. We were able to find contract conditions that suited both parties well. I also appreciate the team's dedication to understand and address the needs of the customer. And the datasets we bought from UniData matched with our expectations.

Max Crous 2024-10-08

Data purchase

Our team got in touch with UniData for purchasing video data. The team at UniData was transparent, timely, and pleasant to communicate and negotiate with. Their samples and descriptions aligned well with the data we received. We will certainly reach out to UniData again if we're in search of 3rd party video data.

Abhijeet Zilpelwar 2025-02-26

Data is well organized and easy to…

Data is well organized and easy to consume. We could download and use it for training within few hours of receiving the data links.

Trusted by the world's biggest brands

Our Clients Love Us

Enterprise Document Automation

Document AI Lead

The dataset gave us strong value for both pilot and early-stage testing. We plan to broaden coverage as deployment scales.

Identity Verification Lab

Deputy Director

The data was good. We passed PAD level 1 from iBeta.

Ready to get started?

Tell us what you need — we’ll reply within 24h with a free estimate

What service are you looking for? *

What service are you looking for?

Data Labeling

Data Collection

Ready-made Datasets

Human Moderation

Medicine

Other

What's your budget range? *

What's your budget range?

< $1,000

$1,000 – $5,000

$5,000 – $10,000

$10,000 – $50,000

$50,000+

Not sure yet

Оставьте это поле пустым.

Where did you hear about Unidata? *

Where did you hear about Unidata?

Google LinkedIn Kaggle / Hugging Face / Github Referral (colleague, partner, client) G2 ChatGPT / AI assistant Other

I agree to the Terms of Service and Privacy Policy. By submitting my contact information, I consent to receive emails, messages, and calls from Unidata and its affiliates.

Andrew: Head of Client Success

— I'll guide you through every step, from your first
message to full project delivery

Thank you for your
message

It has been successfully sent!

We use cookies to enhance your experience, personalize content, ads, and analyze traffic. By clicking 'Accept All', you agree to our Cookie Policy.

DeepFake Videos Dataset

Dataset Info

Statistics

Technical Characteristics

Dataset Use Cases

Detecting Deepfake Content and Fake Videos

Training Models for Deepfake Detection

Verifying Video Content Authenticity

Building Safer Recognition and Verification Systems

FAQs

Unidata Cases

Digital Tree Passport Annotation for Forest Mapping

License Plate Annotation for Vehicle Recognition System

Sentiment Annotation for Brand Monitoring

Surveillance Video Annotation for Entrance Monitoring

Similar Datasets

Multi-Material Fingerprint Spoofing Dataset

Biometric Fingerprint Spoofing Dataset

Anti-Spoofing Replay PC Videos Dataset

Anti-Spoofing Replay Phone Videos Dataset

Real vs Fake Human Voice – Deepfake Audio Dataset

Kids Anti-Spoofing Dataset

Open Palm Hand Images Dataset

Synthetic Printed Turkish Passports Dataset

iBeta Kids Dataset

Synthetic Printed German Passports Dataset

Why Companies Trust Unidata's Datasets

70+ Datasets

Unique & Diverse Data

Custom Dataset Solutions

100% Legal, Secure & Compliant

Smooth Collaboration & Fast Delivery

Need Proof?

What our clients are saying

UniData

Very Positive Experience!

Very good experience

Data purchase

Data is well organized and easy to…

Our Clients Love Us

Enterprise Document Automation

Identity Verification Lab

Ready to get started?

Thank you for your message

Ready to get started?

Technical
Characteristics

Thank you for your
message