Commercial
British English Speech Recognition Dataset
The dataset consists of 200 hours of high-quality telephone dialogues from 310 native speakers in the UK, with detailed annotations (transcriptions, timestamps, speaker ID, gender, and background noise) to support speech recognition systems, NLP tasks, and machine learning models requiring diverse British English audio datasets.
Request a demo
-
- Speakers
- 310
-
- Hours
- 200
-
- Sentence Accuracy Rate
- 95%
- NLP
- LLM
- Machine Learning
- Audio Processing
- ASR
- Voice Recognition
-
- Speakers
- 310
-
- Hours
- 200
-
- Sentence Accuracy Rate
- 95%
Dataset Info
Characteristic | Data |
Description | Audio of telephone dialogues in English for training NLP models in real-world conversational scenarios. |
Data types | Audio |
Tasks | Speech recognition, NLP |
Country | The United Kingdom (GBK) |
Hours of telephone dialogue | 200 |
Number of speakers | 310 |
Labeling | Annotation (transcription text, timestamp, speaker ID, gender, noise) |
Gender | Male (42%), Female (58%) |
Recording device | Android smartphone, iPhone |
Statistics
-
- Distribution by gender
Technical
Characteristics
Characteristic | Data |
Audio Format | Uncompressed WAV |
Sampling Rate | 16kHz |
Bit Depth | 16bit |
Number of Channels | Mono |
Recording condition | Low background noise (indoor) |
Dataset Use Cases
What is included in British English Speech Recognition Dataset?
This speech corpus includes 200 hours of telephone dialogues from 310 native British English speakers. The dataset consists of audio files in WAV format with annotations such as transcription text, timestamps, speaker ID, gender, and noise level.
What types of annotations are provided?
This dataset includes transcribed dialogues with metadata such as speaker ID, gender, timestamps, and background noise tags.
What are the sources of data for Unidata datasets?
Unidata datasets are created through controlled data collection and trusted partnerships. The dataset was recorded by native speakers on smartphones in indoor conditions with low background noise, ensuring high-quality speech signals.
Is it possible to request a custom dataset?
Yes, Unidata offers custom speech datasets tailored to specific needs. You may request additional English accents, speech samples, or recording conditions to train more accurate speech recognition systems or fine-tuned language models.
Still have questions about using Unidata datasets?
Read our user-guides
Similar Datasets
What our clients are saying

UniData
Why Choose Us
Unidata offers unparalleled expertise in AI data solutions, delivering superior data quality and optimized workflowsExpertise
Our team consists of industry-leading experts in AI data solutionsQuality
We ensure superior data quality to maximize your AI project's potentialEfficiency
Our optimized workflows accelerate your model training processesProven Results
Our track record of case studies demonstrates our ability to deliver outstanding outcomesCustomization
Our track record of case studies demonstrates our ability to deliver outstanding outcomesSupport
We provide ongoing support and consultation to ensure continuous success
- 1000 +
- full-time assessors
Ready to get started?
Tell us what you need — we’ll reply within 24h with a free estimate

- Andrew
- Head of Client Success
— I'll guide you through every step, from your first
message to full project delivery
Thank you for your
message
We use cookies to enhance your experience, personalize content, ads, and analyze traffic. By clicking 'Accept All', you agree to our Cookie Policy.