Task
The client requested the transcription of 80 hours of audio materials, with a strong focus on accuracy and full alignment between the transcripts and the original recordings. These files contained real conversations and calls with background noise.
A key feature of this project was the absence of pre-labeling. We worked with audio files of varying quality, including challenging cases with noise and overlapping voices.
It was also essential to ensure the synchronization of text with audio, which required close attention — especially in segments where several people spoke simultaneously or where background sounds interfered.
Solution
-
- 01
-
Preparation and workflow organization:
- The data was split into short fragments of 5-15 seconds and uploaded to Label Studio.
- Each annotator received clear instructions on working with audio and accurately capturing the spoken text.
-
- 02
-
Data annotation:
- Annotators carefully listened to each recording and manually transcribed the spoken words.
- Special attention was paid to clarity and understanding in cases of overlapping voices or muffled speech.
-
- 03
-
Quality control:
- All data underwent validation, and feedback was provided on any errors, which were then sent back to annotators for correction.
- A feedback system was used throughout the project to improve accuracy and efficiency.
Results
The project was delivered on time — 80 hours of transcription per month.
Quality was ensured not only through validation but, first and foremost, through effective training and the team’s deep understanding of the guidelines. Validation also played an important role.
The team’s high productivity allowed us to consistently handle the workload without pre-labeling.