NLP Annotation services

Advanced Message Filtering for Platform Safety

Image

When user trust is at stake, platforms can’t afford to let harmful messages slip through.

A major classifieds company needed a reliable way to protect conversations on their platform — without slowing them down. To make that happen, Unidata provided high-precision annotation and validation for a smart filtering and classification system that now helps keep millions of daily interactions safe and respectful.

Image

Client Request

Our client, a leading company in the classifieds industry, aimed to build a message filtering system that would:

  • Prevent the spread of inappropriate or restricted content
  • Improve overall conversation quality on the platform
  • Protect users from violations such as:
    • Offensive or abusive language
    • Personal data disclosure
    • Negative or harmful speech

To achieve this, Unidata was brought in to annotate and validate the dataset, providing the foundation for a model that could reliably detect and categorize sensitive content.

Our Approach

Technical Requirements and Pilot Phase

The client provided a detailed technical brief outlining classification requirements. Our team proposed additional refinements to ensure a more precise and layered annotation process.

During the pilot phase, we collaborated closely with the client to:

  • Clarify classification rules for key categories, including:
    • Insults and abusive language
    • Mentions of personal information
    • References to meeting arrangements
    • Negative sentiment directed at the platform
  • Address complex edge cases, such as:
    • Implicit mentions of meeting locations (e.g., vague geographic references without full addresses)

Annotation and Quality Control Process

Our annotation team at Unidata handled classification by carefully considering:

  • Platform-specific communication patterns
  • Informal language use typical in peer-to-peer messaging
  • The context of each message, not just isolated phrases

Messages were annotated across several primary categories:

  • Use of profanities or slurs
  • Disclosure of personal or sensitive information
  • Various forms of direct and indirect insults
  • Mentions of meeting points or negotiation outside the platform

Data Validation

To ensure the highest level of annotation accuracy, we implemented a robust validation workflow:

  • Involved experienced validators to review annotated samples
  • Introduced an interactive error analysis process, which included:
    • Team discussions of edge cases
    • Targeted surveys to refine judgment on difficult categories

We also conducted training and testing sessions with annotators focused on:

  • Eliminating errors in high-complexity cases
  • Aligning the team on annotation logic and edge-case handling
  • Ensuring consistent interpretation of classification criteria
StageInputWorkflow ScopeMain Quality Checks
Project SetupClient guidelines & chat dataInstruction review, clarification, tone alignmentGuideline clarity / linguistic consistency
Pilot PhaseSample conversationsTesting annotation logic, resolving edge casesTone accuracy / ambiguity reduction
AnnotationChat messages & reply suggestionsLabeling relevance, safety, tone, grammarContext alignment / toxicity filtering
Linguistic ControlAnnotated responsesInformal style, natural phrasing validationFluency / conversational realism
Validation & QAAnnotated batchesSampling, validator review, escalation of edge casesAccuracy / policy compliance
Feedback LoopQA resultsPerformance tracking, annotator feedbackError reduction / consistency
Training & SupportValidatorsOngoing training, targeted improvementsValidator accuracy
Final DeliveryValidated datasetPackaging and handoffDataset readiness / deployment quality
Project Setup & Guideline Alignment
1 week
Pilot Phase & Linguistic Calibration
2 weeks
Annotation & Validation Phase
2 weeks
Final Evaluation & Delivery
1 week

The Results

  • The model trained on our annotated data was successfully tested and deployed on the client’s platform. Internal testing involved evaluating model performance against randomly selected user messages
  • The initial testing phase showed promising results: – the model accurately blocked inappropriate or restricted content; – responses remained contextually appropriate across various scenarios
In conversational AI, the hardest part isn’t detecting toxicity. It’s generating responses that are neutral, context-aware, and still sound human. That balance only comes from carefully annotated real dialogue.
Vladislav Barsukov
Vladislav Barsukov
Head of SLM&LLM Annotation

Similar Cases

  • Image
    Video Annotation

    Surveillance Video Annotation for Entrance Monitoring

    To train violence detection models, synthetic-looking footage is not enough. We created 200 realistic conflict scenarios with complex movement, occlusions, and crowded environments using multi-camera 4K recording.

    Lean more
  • Image
    Text Labeling

    Sentiment Annotation for Brand Monitoring

    We built a scalable sentiment annotation pipeline that handles sarcasm, ambiguity, and domain-specific nuance — enabling smarter brand analysis and customer insight.

    Lean more
  • Image
    Data Collection

    Image Data Collection for a Palm Recognition Task

    Collecting 20,000 palm photos sounds easy until you try it. We managed scale, verification, and logistics to deliver a clean dataset.

    Lean more
  • Egocentric Data Collection for Humanoid Robot Training
    Data Collection

    Egocentric Data Collection for Humanoid Robot Training

    Open egocentric datasets give you 2D video with no depth, no pose, no tactile signal. Humanoid training requires all three. How do you build a multimodal setup that captures what open data structurally cannot?

    Lean more
  • Image
    Audio Transcription

    Multi-Speaker Audio Annotation for Banking

    We handled complex, real-world audio by combining automation with expert oversight — capturing every voice, pause, and interruption.

    Lean more

Ready to get started?

Tell us what you need — we’ll reply within 24h with a free estimate

    What service are you looking for? *
    What service are you looking for?
    Data Labeling
    Data Collection
    Ready-made Datasets
    Human Moderation
    Medicine
    Other
    What's your budget range? *
    What's your budget range?
    < $1,000
    $1,000 – $5,000
    $5,000 – $10,000
    $10,000 – $50,000
    $50,000+
    Not sure yet
    • United States+1
    • United Kingdom+44
    • Afghanistan (‫افغانستان‬‎)+93
    • Albania (Shqipëri)+355
    • Algeria (‫الجزائر‬‎)+213
    • American Samoa+1684
    • Andorra+376
    • Angola+244
    • Anguilla+1264
    • Antigua and Barbuda+1268
    • Argentina+54
    • Armenia (Հայաստան)+374
    • Aruba+297
    • Australia+61
    • Austria (Österreich)+43
    • Azerbaijan (Azərbaycan)+994
    • Bahamas+1242
    • Bahrain (‫البحرين‬‎)+973
    • Bangladesh (বাংলাদেশ)+880
    • Barbados+1246
    • Belarus (Беларусь)+375
    • Belgium (België)+32
    • Belize+501
    • Benin (Bénin)+229
    • Bermuda+1441
    • Bhutan (འབྲུག)+975
    • Bolivia+591
    • Bosnia and Herzegovina (Босна и Херцеговина)+387
    • Botswana+267
    • Brazil (Brasil)+55
    • British Indian Ocean Territory+246
    • British Virgin Islands+1284
    • Brunei+673
    • Bulgaria (България)+359
    • Burkina Faso+226
    • Burundi (Uburundi)+257
    • Cambodia (កម្ពុជា)+855
    • Cameroon (Cameroun)+237
    • Canada+1
    • Cape Verde (Kabu Verdi)+238
    • Caribbean Netherlands+599
    • Cayman Islands+1345
    • Central African Republic (République centrafricaine)+236
    • Chad (Tchad)+235
    • Chile+56
    • China (中国)+86
    • Christmas Island+61
    • Cocos (Keeling) Islands+61
    • Colombia+57
    • Comoros (‫جزر القمر‬‎)+269
    • Congo (DRC) (Jamhuri ya Kidemokrasia ya Kongo)+243
    • Congo (Republic) (Congo-Brazzaville)+242
    • Cook Islands+682
    • Costa Rica+506
    • Côte d’Ivoire+225
    • Croatia (Hrvatska)+385
    • Cuba+53
    • Curaçao+599
    • Cyprus (Κύπρος)+357
    • Czech Republic (Česká republika)+420
    • Denmark (Danmark)+45
    • Djibouti+253
    • Dominica+1767
    • Dominican Republic (República Dominicana)+1
    • Ecuador+593
    • Egypt (‫مصر‬‎)+20
    • El Salvador+503
    • Equatorial Guinea (Guinea Ecuatorial)+240
    • Eritrea+291
    • Estonia (Eesti)+372
    • Ethiopia+251
    • Falkland Islands (Islas Malvinas)+500
    • Faroe Islands (Føroyar)+298
    • Fiji+679
    • Finland (Suomi)+358
    • France+33
    • French Guiana (Guyane française)+594
    • French Polynesia (Polynésie française)+689
    • Gabon+241
    • Gambia+220
    • Georgia (საქართველო)+995
    • Germany (Deutschland)+49
    • Ghana (Gaana)+233
    • Gibraltar+350
    • Greece (Ελλάδα)+30
    • Greenland (Kalaallit Nunaat)+299
    • Grenada+1473
    • Guadeloupe+590
    • Guam+1671
    • Guatemala+502
    • Guernsey+44
    • Guinea (Guinée)+224
    • Guinea-Bissau (Guiné Bissau)+245
    • Guyana+592
    • Haiti+509
    • Honduras+504
    • Hong Kong (香港)+852
    • Hungary (Magyarország)+36
    • Iceland (Ísland)+354
    • India (भारत)+91
    • Indonesia+62
    • Iran (‫ایران‬‎)+98
    • Iraq (‫العراق‬‎)+964
    • Ireland+353
    • Isle of Man+44
    • Israel (‫ישראל‬‎)+972
    • Italy (Italia)+39
    • Jamaica+1876
    • Japan (日本)+81
    • Jersey+44
    • Jordan (‫الأردن‬‎)+962
    • Kazakhstan (Казахстан)+7
    • Kenya+254
    • Kiribati+686
    • Kosovo+383
    • Kuwait (‫الكويت‬‎)+965
    • Kyrgyzstan (Кыргызстан)+996
    • Laos (ລາວ)+856
    • Latvia (Latvija)+371
    • Lebanon (‫لبنان‬‎)+961
    • Lesotho+266
    • Liberia+231
    • Libya (‫ليبيا‬‎)+218
    • Liechtenstein+423
    • Lithuania (Lietuva)+370
    • Luxembourg+352
    • Macau (澳門)+853
    • Macedonia (FYROM) (Македонија)+389
    • Madagascar (Madagasikara)+261
    • Malawi+265
    • Malaysia+60
    • Maldives+960
    • Mali+223
    • Malta+356
    • Marshall Islands+692
    • Martinique+596
    • Mauritania (‫موريتانيا‬‎)+222
    • Mauritius (Moris)+230
    • Mayotte+262
    • Mexico (México)+52
    • Micronesia+691
    • Moldova (Republica Moldova)+373
    • Monaco+377
    • Mongolia (Монгол)+976
    • Montenegro (Crna Gora)+382
    • Montserrat+1664
    • Morocco (‫المغرب‬‎)+212
    • Mozambique (Moçambique)+258
    • Myanmar (Burma) (မြန်မာ)+95
    • Namibia (Namibië)+264
    • Nauru+674
    • Nepal (नेपाल)+977
    • Netherlands (Nederland)+31
    • New Caledonia (Nouvelle-Calédonie)+687
    • New Zealand+64
    • Nicaragua+505
    • Niger (Nijar)+227
    • Nigeria+234
    • Niue+683
    • Norfolk Island+672
    • North Korea (조선 민주주의 인민 공화국)+850
    • Northern Mariana Islands+1670
    • Norway (Norge)+47
    • Oman (‫عُمان‬‎)+968
    • Pakistan (‫پاکستان‬‎)+92
    • Palau+680
    • Palestine (‫فلسطين‬‎)+970
    • Panama (Panamá)+507
    • Papua New Guinea+675
    • Paraguay+595
    • Peru (Perú)+51
    • Philippines+63
    • Poland (Polska)+48
    • Portugal+351
    • Puerto Rico+1
    • Qatar (‫قطر‬‎)+974
    • Réunion (La Réunion)+262
    • Romania (România)+40
    • Russia (Россия)+7
    • Rwanda+250
    • Saint Barthélemy+590
    • Saint Helena+290
    • Saint Kitts and Nevis+1869
    • Saint Lucia+1758
    • Saint Martin (Saint-Martin (partie française))+590
    • Saint Pierre and Miquelon (Saint-Pierre-et-Miquelon)+508
    • Saint Vincent and the Grenadines+1784
    • Samoa+685
    • San Marino+378
    • São Tomé and Príncipe (São Tomé e Príncipe)+239
    • Saudi Arabia (‫المملكة العربية السعودية‬‎)+966
    • Senegal (Sénégal)+221
    • Serbia (Србија)+381
    • Seychelles+248
    • Sierra Leone+232
    • Singapore+65
    • Sint Maarten+1721
    • Slovakia (Slovensko)+421
    • Slovenia (Slovenija)+386
    • Solomon Islands+677
    • Somalia (Soomaaliya)+252
    • South Africa+27
    • South Korea (대한민국)+82
    • South Sudan (‫جنوب السودان‬‎)+211
    • Spain (España)+34
    • Sri Lanka (ශ්‍රී ලංකාව)+94
    • Sudan (‫السودان‬‎)+249
    • Suriname+597
    • Svalbard and Jan Mayen+47
    • Swaziland+268
    • Sweden (Sverige)+46
    • Switzerland (Schweiz)+41
    • Syria (‫سوريا‬‎)+963
    • Taiwan (台灣)+886
    • Tajikistan+992
    • Tanzania+255
    • Thailand (ไทย)+66
    • Timor-Leste+670
    • Togo+228
    • Tokelau+690
    • Tonga+676
    • Trinidad and Tobago+1868
    • Tunisia (‫تونس‬‎)+216
    • Turkey (Türkiye)+90
    • Turkmenistan+993
    • Turks and Caicos Islands+1649
    • Tuvalu+688
    • U.S. Virgin Islands+1340
    • Uganda+256
    • Ukraine (Україна)+380
    • United Arab Emirates (‫الإمارات العربية المتحدة‬‎)+971
    • United Kingdom+44
    • United States+1
    • Uruguay+598
    • Uzbekistan (Oʻzbekiston)+998
    • Vanuatu+678
    • Vatican City (Città del Vaticano)+39
    • Venezuela+58
    • Vietnam (Việt Nam)+84
    • Wallis and Futuna (Wallis-et-Futuna)+681
    • Western Sahara (‫الصحراء الغربية‬‎)+212
    • Yemen (‫اليمن‬‎)+967
    • Zambia+260
    • Zimbabwe+263
    • Åland Islands+358
    Where did you hear about Unidata? *
    Where did you hear about Unidata?
    Andrew
    Head of Client Success

    — I'll guide you through every step, from your first
    message to full project delivery

    Thank you for your
    message

    It has been successfully sent!

    We use cookies to enhance your experience, personalize content, ads, and analyze traffic. By clicking 'Accept All', you agree to our Cookie Policy.