Home Case Studies Multi-Speaker Audio Annotation for Banking

Audio Transcription

Multi-Speaker Audio Annotation for Banking

Automated transcription alone couldn’t handle the nuance of real conversations with background noise and interruptions. Our human-in-the-loop workflow ensured every detail was captured and tagged.

Challenge

The project aimed to train models capable of automatically summarizing meetings and accurately distinguishing between different speakers. Our role was to prepare annotated audio data to power an AI bot designed to process and analyze conversations.
The client requested annotation of long audio fragments for model training, requiring precision and attention to detail at every step. Our tasks included:

Segmenting long audio files with exact timestamps (e.g., from 00:01:23 to 00:01:45).
Identifying speakers (e.g., Speaker-1, Speaker-2) and tagging unintelligible speech, breaths, and overlapping voices with dedicated labels.
Transcribing the text following accurate segmentation.

Project challenges included:

Long audio recordings (ranging from 16 to 60 minutes) with multiple speakers.
The need to tag specific sounds separately.
Strict accuracy requirements: no overlapping segments and precise time boundaries for each.

Solution

Preparation and workflow organization:

The project was split into two phases:

Audio segmentation by speaker and sound type.
Transcription of the segmented fragments.

We assembled dedicated teams for each phase: 5 annotators for segmentation and 5 for transcription, minimizing the risk of errors.

Training materials provided to annotators included:

A detailed guideline document with tag examples.
Video tutorials on handling complex cases and avoiding common mistakes.
A Q&A table with clarifications from the client.

We also organized feedback sessions through video reviews of each annotator’s initial work.

Data annotation process:

Annotators marked audio fragments with precise timestamps and assigned the appropriate tags.
A separate team then transcribed the segmented audio, using special tags (e.g., [NAME] for names) to annotate entities within the text.

Quality control:

We implemented a validation system with step-by-step checks for every file.
Validators documented all issues in tracking tables with examples and explanations.
In complex cases, a helpdesk was used for quick alignment with the client.

Stage	Input	Workflow scope	Quality checks
Project setup	Client requirements, platform specs	Integration, task flow design, access configuration	System connectivityTask logic consistency
Participant onboarding	Annotator pool (5 segmentation + 5 transcription)	Recruitment, onboarding, instruction delivery, video reviews of initial work	Participant diversityInstruction clarity
Segmentation	Raw audio files	Speaker & sound-type segmentation with precise timestamps; tag assignment (e.g. [NAME])	Boundary precisionNo overlap
Transcription	Segmented audio fragments	Text transcription with entity tags; separate team from segmentation to minimize error risk	Tag correctnessTranscript accuracy
Validation & QC	Annotated files	Step-by-step file validation; issue documentation in tracking tables; helpdesk for complex cases	Data completenessResult consistency
Reporting & iteration	Validated datasets	Weekly reporting, feedback loops, system improvement tracking	Trend accuracyContinuous alignment

Pilot & setup

2 weeks

Participant onboarding

3 weeks

Annotation & iteration

Ongoing

Monitoring & reporting

Weekly, ongoing

The Results

We delivered precise segmentation of 20 hours of complex audio.
Thanks to our two-step workflow and robust validation system, we achieved high-quality annotation that met the client’s requirements.

With recordings up to 60 minutes and four or five speakers talking over each other, the real risk wasn't missing a word, it was a one-second boundary error cascading through every downstream segment. Splitting segmentation and transcription into two independent teams was the decision that made everything else work. Validators weren't just catching mistakes: they were documenting patterns, which meant annotators stopped making the same errors by week two.

Vladislav Barsukov: Head of SLM&LLM Annotation

Similar Cases

NLP Annotation services

Product Grouping for E-commerce

We helped structure the chaos of online listings — enabling cleaner product cards through expert annotation and smart grouping.
Lean more
NLP Annotation services

Mathematical Reasoning Validation for AI

3,500 math problems, three difficulty levels, every solution step checked, not just the final answer. We brought in olympiad students and university instructors to stress-test model logic.
Lean more
Image Annotation

Image Annotation for Construction and Heavy Machinery

We successfully completed a project annotating construction equipment, labeling approximately 5,000 images using object detection methods. Our approach ensured high accuracy and fast turnaround, fully meeting the client’s requirements.
Lean more
Data Collection

Video Data Collection for Street Weapon Detection

From zero to 99% model accuracy in 28 days: we sourced, staged, and annotated video footage for urban weapon detection systems.
Lean more
Data Collection

Image Data Collection for Hair Loss Classification Task

With clear guidelines and a sharp execution strategy, we delivered a high-quality dataset tailored for hair loss classification tasks.
Lean more

Ready to get started?

Tell us what you need — we’ll reply within 24h with a free estimate

What service are you looking for? *

What service are you looking for?

Data Labeling

AI Model Testing

Data Collection

Ready-made Datasets

Human Moderation

Medicine

Other

What's your budget range? *

What's your budget range?

< $5,000

$5,000 – $25,000

$25,000 – $50,000

$50,000 – $100,000

$100,000+

Not sure yet

Where did you hear about Unidata? *

Where did you hear about Unidata?

Google LinkedIn Kaggle / Hugging Face / Github Referral (colleague, partner, client) G2 ChatGPT / AI assistant Other

I agree to the Terms of Service and Privacy Policy. By submitting my contact information, I consent to receive emails, messages, and calls from Unidata and its affiliates.

Andrew: Head of Client Success

— I'll guide you through every step, from your first
message to full project delivery

Thank you for your
message

It has been successfully sent!

We use cookies to enhance your experience, personalize content, ads, and analyze traffic. By clicking 'Accept All', you agree to our Cookie Policy.

Multi-Speaker Audio Annotation for Banking

Challenge

Solution

Preparation and workflow organization:

Training materials provided to annotators included:

Data annotation process:

Quality control:

The Results

Similar Cases

Product Grouping for E-commerce

Mathematical Reasoning Validation for AI

Image Annotation for Construction and Heavy Machinery

Video Data Collection for Street Weapon Detection

Image Data Collection for Hair Loss Classification Task

Ready to get started?

Thank you for your message

Ready to get started?

Thank you for your
message