Home Datasets Biometric Speech Emotion Recognition Dataset

Commercial

Speech Emotion Recognition Dataset

Speech Emotion Recognition Dataset comprises over 30,000 audio recordings labeled with four distinct speech emotions: euphoria, joy, sadness, and surprise. It is designed to train emotion recognition and speech recognition systems using rich audio features, human-labeled metadata, and diverse emotional expressions for advanced machine learning and sentiment analysis tasks.

audio

30,000+
emotions

4

Emotion Recognition
Speech Analysis
Audio
ASR
NLP
Machine learning

Emotion Recognition
Speech Analysis
Audio
ASR
NLP
Machine learning

audio

30,000+
emotions

4

Dataset Info

Characteristic	Data
Description	Dataset of audio recordings featuring 4 distinct emotions
Data types	Audio
Tasks	Emotion recognition, NLP
Total number of files	30,000+
Emotion	Euphoria, joy, sadness, and surprise
Labeling	Annotation (text content, gender, age and country)
Gender	Male, Female

Technical
Characteristics

Characteristic	Data
Audio Format	WAV, mpeg, amr
Recording condition	Low background noise

Source and collection methodology: Data was collected via crowdsourcing platforms.

Dataset Use Cases

Artificial Intelligence & Machine Learning
Training Models for Emotion Detection in Speech

Speech Emotion Recognition Dataset provides high-quality audio recordings and speech signals labeled with distinct emotion classes. It serves as essential training data for machine learning and deep learning models that perform classification tasks in emotion recognition. The dataset consists of balanced samples for detecting positive and negative emotions in natural speech corpus data.
Human-Computer Interaction & Voice Assistants
Enhancing Empathy in Voice-Driven Systems

This dataset helps developers build recognition systems that understand human emotions from speech signals. By analyzing audio features such as tone, pitch, and rhythm, voice assistants and conversational agents can respond with greater sensitivity to emotional expressions. The dataset enables more natural and context-aware speech recognition applications.
Customer Experience & Sentiment Analysis
Improving Emotion-Aware Analytics in Call Centers

Organizations use this emotion detection dataset to develop sentiment analysis tools that assess emotions expressed in customer calls. It contains labeled audio files representing diverse emotional tones, supporting classification methods that recognize frustration, satisfaction, or neutrality. Such models enhance quality monitoring and customer satisfaction analysis in speech-based communication systems.
Academic Research & Multimodal Emotion Studies
Benchmarking Models for Audio Emotion Classification

Researchers utilize this Speech Emotion Recognition Dataset to study multimodal emotion detection and speech emotions across languages and demographics. The corpus contains annotated audio samples with defined acoustic features, making it ideal for evaluating pre-trained models and emotion recognition algorithms. It supports comparative analysis between audio data types, fostering advancements in speech-based emotion recognition research.

FAQs

What is Speech Emotion Recognition Dataset used for?

This dataset is primarily used for emotion recognition, sentiment analysis, and speech-based AI research. It helps in building and fine-tuning emotion detection models for applications such as virtual assistants, customer interaction systems, and human-computer interaction technologies.

What is included in this dataset?

The dataset contains over 30,000 audio recordings of human speech expressing four distinct emotions - euphoria, joy, sadness, and surprise. Each sample includes detailed metadata annotations such as text content, gender, age, and country of the speaker to support multimodal emotion classification tasks.

Can I request a sample of the dataset before purchasing or downloading it?

Yes. Unidata provides free sample data for evaluation and testing. The sample includes a subset of audio recordings with labeled emotions, helping you assess the quality, file formats, and annotation structure before purchasing the complete dataset.

How was the data collected?

The audio recordings were collected using crowdsourcing platforms. All recordings were performed under low background noise conditions, producing high-quality speech signals.

How are Unidata datasets licensed?

Unidata datasets follow a dual-licensing model: free dataset samples are offered for testing and validation, while full datasets are available for purchase. This ensures users can evaluate audio quality and labeling accuracy before acquiring the full speech dataset.

Do Unidata datasets follow GDPR or other data privacy regulations?

Yes. All Unidata datasets are curated in accordance with GDPR and relevant international privacy standards. Data collection is conducted through ethically approved sources, ensuring anonymized and lawful handling of speaker information across all regions.

How are Unidata datasets stored?

All datasets are securely stored on AWS cloud infrastructure, which ensures scalability, reliability, and compliance with ISO 27001 and ISO 27701 standards. This guarantees a privacy-focused and high-availability environment for managing sensitive audio data and speech recordings.

Is this a real-world dataset or synthetic data?

This is a real-world speech dataset, containing genuine audio recordings of human speakers expressing natural emotions. No synthetic or AI-generated voices are included, ensuring that all audio samples reflect authentic emotional speech patterns for realistic model training.

Still have questions about using Unidata datasets?

Unidata Cases

Digital Tree Passport Annotation for Forest Mapping

Forestry Monitoring & GIS
200,000 trees, 10 species classes
2 months

Learn more

License Plate Annotation for Vehicle Recognition System

100,000 images with detailed license plate markup (bounding boxes, digits, regional symbols)
2 weeks

Learn more

Sentiment Annotation for Brand Monitoring

Marketing & Consumer Insights
12,000 text samples, 3 sentiment classes (positive, negative, neutral)
3 weeks

Learn more

Surveillance Video Annotation for Entrance Monitoring

Surveillance & Security
90 minutes of video from three cameras, approximately 50-60 thousand frames
2 week

Learn more

Similar Datasets

Commercial
- Computer Vision
- Machine Learning
- Image Processing
- Security
- Anti-Spoofing
Multi-Material Fingerprint Spoofing Dataset

Multi-Material Fingerprint Spoofing Dataset contains 4,000+ fingerprint images from 100 individuals, captured with a ZKTeco ZK9500 optical scanner and including real fingerprints and spoofing attacks created with alginate, plasticine, and silicone materials. The fingerprint dataset includes metadata (gender, age, finger, hand, device) and supports biometric security research, presentation attack detection, spoof detection, and fingerprint recognition model training.

100 People
4000+ Photos
Commercial
- Computer Vision
- Machine Learning
- Image Processing
- Security
- Anti-Spoofing
Biometric Fingerprint Spoofing Dataset

Biometric Fingerprint Spoofing Dataset contains 5,000+ high-quality fingerprint images capturing real fingerprints and multiple spoofing fingerprint attack types, including print and replay scenarios. Designed for spoofing detection and liveness detection tasks, the fingerprint dataset provides labeled biometric data from different devices and fingers to train and evaluate biometric security and fingerprint recognition systems.

100 People
5000+ Photos
Commercial
- Facial Recognition
- Liveness Detection
- Security
- Anti-spoofing
- Computer Vision
Anti-Spoofing Replay PC Videos Dataset

This is a high-quality replay attack dataset containing 4,714 PC-recorded video clips of real faces, designed for training and evaluating face recognition and liveness detection systems. This anti-spoofing videos dataset includes diverse attack scenarios, technical metadata (age, gender, ethnicity), and MP4/MOV formats to support spoofing detection, biometric security, and computer vision model development.

4,714 Videos
4,714 People
Commercial
- Facial Recognition
- Liveness Detection
- Security
- Anti-spoofing
- Computer Vision
Anti-Spoofing Replay Phone Videos Dataset

This anti-spoofing dataset contains over 38,000 live facial video recordings captured on mobile devices to support replay attack detection and biometric anti-spoofing research. With paired video sets, MP4/MOV formats, and rich metadata such as age, gender, and ethnicity, it provides reliable training data for face antispoofing, liveness detection, and secure biometric authentication systems.

38,029 Videos
20,018 Sets
Commercial
- Speech Analysis
- ASR
- Machine learning
- Data generation
- Audio Processing
Real vs Fake Human Voice – Deepfake Audio Dataset

Real vs Fake Human Voice – Deepfake Audio Dataset contains 5,000 audio files featuring both genuine human recordings and AI-generated voice samples. Each set includes four speakers with multiple clips across M4A and MP3 formats. The dataset supports research in deepfake detection, generated speech analysis, and real vs fake human voice recognition tasks.

5,000 Audio
Commercial
- Facial Recognition
- Security
- Anti-spoofing
- Computer Vision
- Machine Learning
Kids Anti-Spoofing Dataset

Kids Anti-Spoofing Dataset provides 6,000 high-quality facial images of children aged 7–15 for face anti-spoofing and liveness detection tasks. This child safety dataset supports research in biometric systems, helping improve facial recognition accuracy, detect spoofing attacks, and build safer AI models for protecting kids in digital and identification environments.

6 000 Images
300 people
Commercial
- Image Processing
- Machine Learning
- Hand Recognition
- Forensics
- Computer Vision
Open Palm Hand Images Dataset

This high-quality open palm dataset includes 500,000 annotated images collected from 50,000 people, with each set containing six palm photos, two printed-hand images, and two replay videos. Designed for hand recognition and computer vision research, it provides detailed metadata - age, gender, ethnicity, profession, device type, dominant hand, and jewelry status.

500,000 Images
50,000 People
Commercial
- PII
- Data generation
- Security
- Anti-spoofing
- Computer Vision
Synthetic Printed Turkish Passports Dataset

It is a synthetic Turkish passports dataset containing 5,000 high-quality, AI-generated images. Labeled with detailed metadata - including passport ID, class, gender, and lighting - this dataset supports PII extraction, identity verification, and biometric recognition system training while maintaining strict data protection standards.

5000 Images
Commercial
- Facial Recognition
- iBeta
- Liveness Detection
- Security
- Anti-spoofing
- Computer Vision
iBeta Kids Dataset

iBeta Kids Dataset is a child safety dataset featuring over 46,000 short video samples of children across different age groups, recorded under varied lighting, devices, and conditions. It includes four main attack types - Real Person, 2D Mask, 3D Mask, and Replay - helping develop biometric systems that detect spoofing and ensure safe, accurate child identification.

45 600 Videos
60 People
Commercial
- PII
- Data generation
- Security
- Anti-spoofing
- Computer Vision
Synthetic Printed German Passports Dataset

This German passport dataset provides 5,000 AI-generated synthetic passport images, engineered for training and benchmarking ML models in document analysis and PII extraction. It features high-resolution JPG samples with controlled variations across 3 angles, 4 lighting conditions, and 4 backgrounds, each annotated with detailed metadata including passport ID, gender, and age group for robust model development.

5 000 Images

Why Companies Trust Unidata's Datasets

Share your project requirements, we handle the rest. Every service is tailored, executed, and compliance-ready, so you can focus on strategy and growth, not operations.

70+ Datasets

Finance, IT, E-commerce, Retail, Healthcare and 14+ Industries
Multiple supported formats

Unique & Diverse Data

Diversity in ethnicity, age, country, gender, and more
Exclusively collected data, not available from open sources

Custom Dataset Solutions

No manual collection needed from your side; we handle everything
Up to 70% cheaper than in-house

100% Legal, Secure & Compliant

Curated and legally sourced
AWS ISO 27001/27701

Smooth Collaboration & Fast Delivery

87% of datasets delivered in 3–10 days
Dedicated PM, Europe-timezone communication

Need Proof?

See the results we've delivered for leading tech companies and startups.

Explore datasets

What our clients are saying

UniData

4 3 Reviews

Paul 2025-02-21

Very Positive Experience!

The team was very responsive when requesting a specific dataset, and was able to work with us on what data we specifically needed and custom pricing for our use case. Overall a great experience, and would recommend them to others!

Thorsten 2025-01-09

Very good experience

We got in touch with UniData to buy several datasets from them. Communication was very cooperative, quick, and friendly. We were able to find contract conditions that suited both parties well. I also appreciate the team's dedication to understand and address the needs of the customer. And the datasets we bought from UniData matched with our expectations.

Max Crous 2024-10-08

Data purchase

Our team got in touch with UniData for purchasing video data. The team at UniData was transparent, timely, and pleasant to communicate and negotiate with. Their samples and descriptions aligned well with the data we received. We will certainly reach out to UniData again if we're in search of 3rd party video data.

Abhijeet Zilpelwar 2025-02-26

Data is well organized and easy to…

Data is well organized and easy to consume. We could download and use it for training within few hours of receiving the data links.

Trusted by the world's biggest brands

Our Clients Love Us

Enterprise Document Automation

Document AI Lead

The dataset gave us strong value for both pilot and early-stage testing. We plan to broaden coverage as deployment scales.

Identity Verification Lab

Deputy Director

The data was good. We passed PAD level 1 from iBeta.

Ready to get started?

Tell us what you need — we’ll reply within 24h with a free estimate

What service are you looking for? *

What service are you looking for?

Data Labeling

Data Collection

Ready-made Datasets

Human Moderation

Medicine

Other

What's your budget range? *

What's your budget range?

< $1,000

$1,000 – $5,000

$5,000 – $10,000

$10,000 – $50,000

$50,000+

Not sure yet

Оставьте это поле пустым.

Where did you hear about Unidata? *

Where did you hear about Unidata?

Google LinkedIn Kaggle / Hugging Face / Github Referral (colleague, partner, client) G2 ChatGPT / AI assistant Other

I agree to the Terms of Service and Privacy Policy. By submitting my contact information, I consent to receive emails, messages, and calls from Unidata and its affiliates.

Andrew: Head of Client Success

— I'll guide you through every step, from your first
message to full project delivery

Thank you for your
message

It has been successfully sent!

We use cookies to enhance your experience, personalize content, ads, and analyze traffic. By clicking 'Accept All', you agree to our Cookie Policy.

Speech Emotion Recognition Dataset

Dataset Info

Technical Characteristics

Dataset Use Cases

Training Models for Emotion Detection in Speech

Enhancing Empathy in Voice-Driven Systems

Improving Emotion-Aware Analytics in Call Centers

Benchmarking Models for Audio Emotion Classification

FAQs

Unidata Cases

Digital Tree Passport Annotation for Forest Mapping

License Plate Annotation for Vehicle Recognition System

Sentiment Annotation for Brand Monitoring

Surveillance Video Annotation for Entrance Monitoring

Similar Datasets

Multi-Material Fingerprint Spoofing Dataset

Biometric Fingerprint Spoofing Dataset

Anti-Spoofing Replay PC Videos Dataset

Anti-Spoofing Replay Phone Videos Dataset

Real vs Fake Human Voice – Deepfake Audio Dataset

Kids Anti-Spoofing Dataset

Open Palm Hand Images Dataset

Synthetic Printed Turkish Passports Dataset

iBeta Kids Dataset

Synthetic Printed German Passports Dataset

Why Companies Trust Unidata's Datasets

70+ Datasets

Unique & Diverse Data

Custom Dataset Solutions

100% Legal, Secure & Compliant

Smooth Collaboration & Fast Delivery

Need Proof?

What our clients are saying

UniData

Very Positive Experience!

Very good experience

Data purchase

Data is well organized and easy to…

Our Clients Love Us

Enterprise Document Automation

Identity Verification Lab

Ready to get started?

Thank you for your message

Ready to get started?

Technical
Characteristics

Thank you for your
message