Commercial
French Speech Recognition Dataset
This speech recognition dataset comprises 547 hours of telephone dialogues in French from 964 native speakers, providing audio recordings with detailed annotations (text, speaker ID, gender, age) to support speech recognition systems, natural language processing, and deep learning models for training and evaluating automatic speech recognition technology
Request a demo
-
- Hours
- 547
-
- Speakers
- 964
-
- Word Accuracy Rate
- 98%
- NLP
- LLM
- Machine Learning
- Audio Processing
- ASR
- Voice Recognition
-
- Hours
- 547
-
- Speakers
- 964
-
- Word Accuracy Rate
- 98%
Dataset Info
Characteristic | Data |
Description | Audio of telephone dialogues in French for training NLP models in real-world conversational scenarios. |
Data types | Audio |
Tasks | Speech recognition, NLP |
Country | France (FRA) |
Hours of telephone dialogue | 547 |
Number of speakers | 964 |
Labeling | Annotation (text content, speaker's ID, gender, age and other attributes) |
Gender | Male (41%), Female (59%) |
Recording device | Telephone |
Statistics
-
- Distribution by gender
Technical
Characteristics
Characteristic | Data |
Audio Format | PCM, a-law/u-law |
Sampling Rate | 8kHz |
Number of Channels | Mono |
Recording condition | Low background noise (indoor) |
Dataset Use Cases
FAQs
What audio quality and format are provided?
The audio recordings are provided in PCM and a-law/u-law formats with a sampling rate of 8kHz. The mono-channel setup and low-noise indoor conditions ensure clarity for speech recognition models and machine learning training sets.
What accents and speech variations are represented?
The dataset includes various French accents and dialects, reflecting real-world natural language variations. This diversity improves the performance of recognition technology and learning algorithms when applied to different types of French speech scenarios.
How was the data collected?
The speech samples were collected via controlled telephone calls in France, recorded with low background noise for clear speech signals. Each file was processed to ensure consistent audio quality, sampling rate (8kHz), and standardized formats (PCM, a-law/u-law).
Can I request a sample of the dataset before purchasing or downloading it?
Yes, you can request a sample of the dataset to test audio quality, transcription accuracy, and metadata coverage. Samples allow developers to confirm the dataset meets their needs for deep learning models and automatic speech recognition systems.
Still have questions about using Unidata datasets?
Read our user-guides
Similar Datasets
What our clients are saying

UniData
Why Choose Us
Unidata offers unparalleled expertise in AI data solutions, delivering superior data quality and optimized workflowsExpertise
Our team consists of industry-leading experts in AI data solutionsQuality
We ensure superior data quality to maximize your AI project's potentialEfficiency
Our optimized workflows accelerate your model training processesProven Results
Our track record of case studies demonstrates our ability to deliver outstanding outcomesCustomization
Our track record of case studies demonstrates our ability to deliver outstanding outcomesSupport
We provide ongoing support and consultation to ensure continuous success
- 1000 +
- full-time assessors
Ready to get started?
Tell us what you need — we’ll reply within 24h with a free estimate

- Andrew
- Head of Client Success
— I'll guide you through every step, from your first
message to full project delivery
Thank you for your
message
We use cookies to enhance your experience, personalize content, ads, and analyze traffic. By clicking 'Accept All', you agree to our Cookie Policy.