Data Collection

Audio Data Collection for Emotion-Sensitive Voice Systems

Image

We faced a challenging task: collecting 750 unique recordings of children’s laughter, crying, and speech within a month, all while meeting strict quality and diversity requirements. Thanks to a flexible data collection approach, multi-level verification, and well-coordinated teamwork, we successfully met the deadline.

Image

The Task

The client requested the collection of 750 unique audio recordings of children's laughter, crying, and speech within one month. Each child could participate only once, eliminating the possibility of using the same actors multiple times. Strict quality and diversity requirements added complexity to the task.

The Solution

To ensure an efficient data collection process, we divided it into several stages:

 Dataset design and methodology:

  • Defined the target age range and prioritized ethnic and regional groups
  • Developed an age-verification approach combining visual assessment and metadata analysis
  • Created clear, standardized instructions for participants and crowd platforms, including capture examples

Data Collection Approach:

  • A pilot phase using the Yandex.Toloka platform proved to be too slow.
  • We switched to an in-house collection strategy, engaging parents through social media and childcare institutions.
  • To verify the authenticity of the audio, we required submissions in video format to confirm that the laughter, crying, and speech genuinely belonged to a child and that there were no repeated participants.

Data collection

  • Leveraged established crowd platforms and tested new sources to expand geographic coverage
  • Designed simple, engaging tasks to encourage complete and high-quality photo sets
  • Provided fair compensation to reduce drop-off and incomplete submissions
  • Monitored incoming data in real time to address quality issues early

Validation and quality control

  • Combined automated checks with manual expert review to confirm age and photo ownership
  • Applied multi-layer validation, with multiple reviewers cross-checking each submission
  • Minimized inconsistencies and labeling errors, achieving a very low inaccuracy rate
  • Delivered a clean, production-ready dataset suitable for model training and research
StageInputWorkflow ScopeMain Quality Checks
Pilot & SetupClient requirements for 750 unique child audio recordingsDataset design, methodology definition, age range targeting, ethnicity and region prioritization, creation of instructions and capture examplesAge verification approach consistency (visual + metadata), clarity of task instructions
Participant OnboardingParents and childcare institutions via crowd and social channelsRecruitment of participants, onboarding, instruction delivery for recording laughter, crying, and speechParticipant eligibility (child age compliance), instruction comprehension
Attack Collection & IterationAudio and video submissions from childrenTransition from external platform (Yandex.Toloka) to in-house collection, continuous gathering via social media and institutions, ensuring single participation per childAuthenticity of recordings (audio + video confirmation), no participant duplication
Monitoring & ReportingIncoming audio/video datasetReal-time monitoring of submissions, quality tracking, engagement optimization, ongoing iteration of collection strategyData quality consistency, early detection of errors and low-quality submissions
Validation & Quality ControlCollected recordingsAutomated checks + manual expert review, multi-reviewer cross-checking, dataset cleaning and final curationAge confirmation accuracy, identity consistency, labeling correctness, dataset integrity
Final Dataset DeliveryValidated audio datasetPreparation of production-ready dataset for training and research useDataset completeness, reliability, readiness for model training
1–2 weeks
Pilot & Setup
2–3 weeks
Participant Onboarding
ongoing
Attack Collection & Iteration
weekly, ongoing
Monitoring & Reporting

The Results

  • Achieved high confidence in age accuracy and metadata reliability
  • Identified consistent patterns of facial development across diverse ethnic and regional groups
  • Enabled training for face recognition, anti-fraud systems, and academic research
The main challenge was not just collecting 750 child recordings, but ensuring each submission was truly unique and trustworthy. Switching from platform-based collection to direct engagement with parents was the turning point that allowed us to meet both scale and quality requirements within a month.
Lucy Mamedoff
Lucy Mamedoff
Data Collection Project Manager

Similar Cases

  • Image
    Data Collection

    Image Data Collection for Biometric System

    We built a reliable dataset for biometric system testing — fast, compliant, and ready for integration.

    Lean more
  • Image
    Image Annotation

    Image Annotation for Strawberry Ripeness Detection

    Our custom dataset powered the transition from manual picking to AI-assisted harvesting — optimizing yield through data-driven ripeness detection.

    Lean more
  • Image
    Image Annotation

    Pose Estimation for Proctoring

    How do you teach AI to recognize when a student is cheating during an exam? By accurately annotating 6000 images of real exam scenarios — and that’s exactly what we did.

    Lean more
  • Image
    Image Annotation

    Urban Image Annotation for Waste Detection

    AI meets urban planning: our dataset enabled the automation of waste collection, reducing costs and improving municipal services.

    Lean more
  • Image
    Geospatial Annotation services

    LiDAR Annotation for Robotics

    City streets in 3D: thousands of objects, overlapping geometries, no margin for misclassification. 3,000 point clouds processed in 19 days at 99% accuracy. What does it take to make raw spatial data reliable enough for robotics?

    Lean more

Ready to get started?

Tell us what you need — we’ll reply within 24h with a free estimate

    What service are you looking for? *
    What service are you looking for?
    Data Labeling
    Data Collection
    Ready-made Datasets
    Human Moderation
    Medicine
    Other
    What's your budget range? *
    What's your budget range?
    < $1,000
    $1,000 – $5,000
    $5,000 – $10,000
    $10,000 – $50,000
    $50,000+
    Not sure yet
    • United States+1
    • United Kingdom+44
    • Afghanistan (‫افغانستان‬‎)+93
    • Albania (Shqipëri)+355
    • Algeria (‫الجزائر‬‎)+213
    • American Samoa+1684
    • Andorra+376
    • Angola+244
    • Anguilla+1264
    • Antigua and Barbuda+1268
    • Argentina+54
    • Armenia (Հայաստան)+374
    • Aruba+297
    • Australia+61
    • Austria (Österreich)+43
    • Azerbaijan (Azərbaycan)+994
    • Bahamas+1242
    • Bahrain (‫البحرين‬‎)+973
    • Bangladesh (বাংলাদেশ)+880
    • Barbados+1246
    • Belarus (Беларусь)+375
    • Belgium (België)+32
    • Belize+501
    • Benin (Bénin)+229
    • Bermuda+1441
    • Bhutan (འབྲུག)+975
    • Bolivia+591
    • Bosnia and Herzegovina (Босна и Херцеговина)+387
    • Botswana+267
    • Brazil (Brasil)+55
    • British Indian Ocean Territory+246
    • British Virgin Islands+1284
    • Brunei+673
    • Bulgaria (България)+359
    • Burkina Faso+226
    • Burundi (Uburundi)+257
    • Cambodia (កម្ពុជា)+855
    • Cameroon (Cameroun)+237
    • Canada+1
    • Cape Verde (Kabu Verdi)+238
    • Caribbean Netherlands+599
    • Cayman Islands+1345
    • Central African Republic (République centrafricaine)+236
    • Chad (Tchad)+235
    • Chile+56
    • China (中国)+86
    • Christmas Island+61
    • Cocos (Keeling) Islands+61
    • Colombia+57
    • Comoros (‫جزر القمر‬‎)+269
    • Congo (DRC) (Jamhuri ya Kidemokrasia ya Kongo)+243
    • Congo (Republic) (Congo-Brazzaville)+242
    • Cook Islands+682
    • Costa Rica+506
    • Côte d’Ivoire+225
    • Croatia (Hrvatska)+385
    • Cuba+53
    • Curaçao+599
    • Cyprus (Κύπρος)+357
    • Czech Republic (Česká republika)+420
    • Denmark (Danmark)+45
    • Djibouti+253
    • Dominica+1767
    • Dominican Republic (República Dominicana)+1
    • Ecuador+593
    • Egypt (‫مصر‬‎)+20
    • El Salvador+503
    • Equatorial Guinea (Guinea Ecuatorial)+240
    • Eritrea+291
    • Estonia (Eesti)+372
    • Ethiopia+251
    • Falkland Islands (Islas Malvinas)+500
    • Faroe Islands (Føroyar)+298
    • Fiji+679
    • Finland (Suomi)+358
    • France+33
    • French Guiana (Guyane française)+594
    • French Polynesia (Polynésie française)+689
    • Gabon+241
    • Gambia+220
    • Georgia (საქართველო)+995
    • Germany (Deutschland)+49
    • Ghana (Gaana)+233
    • Gibraltar+350
    • Greece (Ελλάδα)+30
    • Greenland (Kalaallit Nunaat)+299
    • Grenada+1473
    • Guadeloupe+590
    • Guam+1671
    • Guatemala+502
    • Guernsey+44
    • Guinea (Guinée)+224
    • Guinea-Bissau (Guiné Bissau)+245
    • Guyana+592
    • Haiti+509
    • Honduras+504
    • Hong Kong (香港)+852
    • Hungary (Magyarország)+36
    • Iceland (Ísland)+354
    • India (भारत)+91
    • Indonesia+62
    • Iran (‫ایران‬‎)+98
    • Iraq (‫العراق‬‎)+964
    • Ireland+353
    • Isle of Man+44
    • Israel (‫ישראל‬‎)+972
    • Italy (Italia)+39
    • Jamaica+1876
    • Japan (日本)+81
    • Jersey+44
    • Jordan (‫الأردن‬‎)+962
    • Kazakhstan (Казахстан)+7
    • Kenya+254
    • Kiribati+686
    • Kosovo+383
    • Kuwait (‫الكويت‬‎)+965
    • Kyrgyzstan (Кыргызстан)+996
    • Laos (ລາວ)+856
    • Latvia (Latvija)+371
    • Lebanon (‫لبنان‬‎)+961
    • Lesotho+266
    • Liberia+231
    • Libya (‫ليبيا‬‎)+218
    • Liechtenstein+423
    • Lithuania (Lietuva)+370
    • Luxembourg+352
    • Macau (澳門)+853
    • Macedonia (FYROM) (Македонија)+389
    • Madagascar (Madagasikara)+261
    • Malawi+265
    • Malaysia+60
    • Maldives+960
    • Mali+223
    • Malta+356
    • Marshall Islands+692
    • Martinique+596
    • Mauritania (‫موريتانيا‬‎)+222
    • Mauritius (Moris)+230
    • Mayotte+262
    • Mexico (México)+52
    • Micronesia+691
    • Moldova (Republica Moldova)+373
    • Monaco+377
    • Mongolia (Монгол)+976
    • Montenegro (Crna Gora)+382
    • Montserrat+1664
    • Morocco (‫المغرب‬‎)+212
    • Mozambique (Moçambique)+258
    • Myanmar (Burma) (မြန်မာ)+95
    • Namibia (Namibië)+264
    • Nauru+674
    • Nepal (नेपाल)+977
    • Netherlands (Nederland)+31
    • New Caledonia (Nouvelle-Calédonie)+687
    • New Zealand+64
    • Nicaragua+505
    • Niger (Nijar)+227
    • Nigeria+234
    • Niue+683
    • Norfolk Island+672
    • North Korea (조선 민주주의 인민 공화국)+850
    • Northern Mariana Islands+1670
    • Norway (Norge)+47
    • Oman (‫عُمان‬‎)+968
    • Pakistan (‫پاکستان‬‎)+92
    • Palau+680
    • Palestine (‫فلسطين‬‎)+970
    • Panama (Panamá)+507
    • Papua New Guinea+675
    • Paraguay+595
    • Peru (Perú)+51
    • Philippines+63
    • Poland (Polska)+48
    • Portugal+351
    • Puerto Rico+1
    • Qatar (‫قطر‬‎)+974
    • Réunion (La Réunion)+262
    • Romania (România)+40
    • Russia (Россия)+7
    • Rwanda+250
    • Saint Barthélemy+590
    • Saint Helena+290
    • Saint Kitts and Nevis+1869
    • Saint Lucia+1758
    • Saint Martin (Saint-Martin (partie française))+590
    • Saint Pierre and Miquelon (Saint-Pierre-et-Miquelon)+508
    • Saint Vincent and the Grenadines+1784
    • Samoa+685
    • San Marino+378
    • São Tomé and Príncipe (São Tomé e Príncipe)+239
    • Saudi Arabia (‫المملكة العربية السعودية‬‎)+966
    • Senegal (Sénégal)+221
    • Serbia (Србија)+381
    • Seychelles+248
    • Sierra Leone+232
    • Singapore+65
    • Sint Maarten+1721
    • Slovakia (Slovensko)+421
    • Slovenia (Slovenija)+386
    • Solomon Islands+677
    • Somalia (Soomaaliya)+252
    • South Africa+27
    • South Korea (대한민국)+82
    • South Sudan (‫جنوب السودان‬‎)+211
    • Spain (España)+34
    • Sri Lanka (ශ්‍රී ලංකාව)+94
    • Sudan (‫السودان‬‎)+249
    • Suriname+597
    • Svalbard and Jan Mayen+47
    • Swaziland+268
    • Sweden (Sverige)+46
    • Switzerland (Schweiz)+41
    • Syria (‫سوريا‬‎)+963
    • Taiwan (台灣)+886
    • Tajikistan+992
    • Tanzania+255
    • Thailand (ไทย)+66
    • Timor-Leste+670
    • Togo+228
    • Tokelau+690
    • Tonga+676
    • Trinidad and Tobago+1868
    • Tunisia (‫تونس‬‎)+216
    • Turkey (Türkiye)+90
    • Turkmenistan+993
    • Turks and Caicos Islands+1649
    • Tuvalu+688
    • U.S. Virgin Islands+1340
    • Uganda+256
    • Ukraine (Україна)+380
    • United Arab Emirates (‫الإمارات العربية المتحدة‬‎)+971
    • United Kingdom+44
    • United States+1
    • Uruguay+598
    • Uzbekistan (Oʻzbekiston)+998
    • Vanuatu+678
    • Vatican City (Città del Vaticano)+39
    • Venezuela+58
    • Vietnam (Việt Nam)+84
    • Wallis and Futuna (Wallis-et-Futuna)+681
    • Western Sahara (‫الصحراء الغربية‬‎)+212
    • Yemen (‫اليمن‬‎)+967
    • Zambia+260
    • Zimbabwe+263
    • Åland Islands+358
    Where did you hear about Unidata? *
    Where did you hear about Unidata?
    Andrew
    Head of Client Success

    — I'll guide you through every step, from your first
    message to full project delivery

    Thank you for your
    message

    It has been successfully sent!

    We use cookies to enhance your experience, personalize content, ads, and analyze traffic. By clicking 'Accept All', you agree to our Cookie Policy.