Text Labeling

Chat Message Annotation for Toxic Content Filtering

Image

Predicting the right reply isn’t just about words — it’s about tone, context, and timing. Our annotation work made AI messaging sound more human.

Image

Client Request

Every day, thousands of buyers ask the same questions — and sellers can’t always keep up.

To automate these routine conversations without losing the human touch, a major classified platform turned to Unidata. The client aimed to develop an AI-powered system capable of predicting suggested replies in user-to-user chats. The goals were to:

  • Streamline conversations between sellers and buyers
  • Improve message relevance and clarity
  • Reduce the risk of inappropriate or offensive messages

Unidata was engaged to provide high-quality data annotation for training the model. After reviewing and refining the client’s technical documentation, we initiated the project.

Our Approach

Technical Scope and Pilot Phase

The client supplied detailed guidelines outlining annotation requirements. Our team reviewed the instructions and proposed clarifications to better align the process with linguistic and contextual nuances.

During the pilot phase, we focused on:

  • Addressing questions related to linguistic accuracy and stylistic tone
  • Ensuring text suggestions reflected correct grammar and spelling
  • Maintaining a conversational, informal style appropriate for peer-to-peer messaging
  • Aligning all outputs with the norms of Russian language usage and the expectations of the platform’s user base

Annotation and Review Process

Our annotation team evaluated and labeled each suggested reply based on the following key criteria:

  • Relevance to the user’s message
  • Absence of provocative or offensive content
  • Contextual accuracy within the flow of conversation
  • Grammatical and stylistic correctness, including informal phrasing typical for chat communication

This required attention to detail across tone, punctuation, and naturalness of expression.

Validation Workflow

To ensure the highest accuracy, each batch of annotated suggestions underwent mandatory validation. Our validation process included:

  • Selecting representative data samples for quality control
  • Actively raising clarification requests and edge cases with project leads
  • Sharing productivity and quality statistics per annotator with team managers

We placed particular emphasis on validator performance by:

  • Involving the training team to improve validator skill levels
  • Providing targeted learning resources and quality feedback loops
StageInputWorkflow ScopeMain Quality Checks
Project SetupClient guidelines & chat dataInstruction review, clarification, tone alignmentGuideline clarity / linguistic consistency
Pilot PhaseSample conversationsTesting annotation logic, resolving edge casesTone accuracy / ambiguity reduction
AnnotationChat messages & reply suggestionsLabeling relevance, safety, tone, grammarContext alignment / toxicity filtering
Linguistic ControlAnnotated responsesInformal style, natural phrasing validationFluency / conversational realism
Validation & QAAnnotated batchesSampling, validator review, escalation of edge casesAccuracy / policy compliance
Feedback LoopQA resultsPerformance tracking, annotator feedbackError reduction / consistency
Training & SupportValidatorsOngoing training, targeted improvementsValidator accuracy
Final DeliveryValidated datasetPackaging and handoffDataset readiness / deployment quality
Project Setup & Guideline Alignment
1 week
Pilot Phase & Linguistic Calibration
2 weeks
Annotation + Validation Phase
2 weeks
Final Evaluation & Delivery
1 week

The Results

  • The model trained on the annotated dataset was successfully deployed.
  • Internal client testing was conducted using real-time user dialogs to assess the accuracy and appropriateness of predicted replies. Early results showed high-quality, context-aware suggestions, no inappropriate topics or formulations, and natural tone suited for real conversations
  • In a dedicated testing session, the client team manually evaluated the model’s responses in a live test environment. The system returned neutral, context-appropriate suggestions that avoided escalation or policy violations.
In conversational AI, the hardest part isn’t detecting toxicity. It’s generating responses that are neutral, context-aware, and still sound human. That balance only comes from carefully annotated real dialogue.
Vladislav Barsukov
Vladislav Barsukov
Head of SLM&LLM Annotation

Similar Cases

  • Image
    NLP Annotation services

    Expert Financial Data Annotation for AI

    CFA-level cases, multi-step calculations, and professional English, all at once. 20–25% hiring conversion, no in-house domain expertise on the ops side. How do you maintain expert consistency when the domain leaves no room for approximation?

    Lean more
  • Image
    Video Annotation

    Surveillance Video Annotation for Entrance Monitoring

    To train violence detection models, synthetic-looking footage is not enough. We created 200 realistic conflict scenarios with complex movement, occlusions, and crowded environments using multi-camera 4K recording.

    Lean more
  • Image
    Text Labeling

    Sentiment Annotation for Brand Monitoring

    We built a scalable sentiment annotation pipeline that handles sarcasm, ambiguity, and domain-specific nuance — enabling smarter brand analysis and customer insight.

    Lean more
  • Image
    Image Annotation

    License Plate Annotation for Vehicle Recognition System

    How do you annotate 100,000 license plates with dozens of nuances — from Arabic characters to regional codes — and still meet a two-week deadline?

    Lean more
  • Image
    NLP Annotation services

    Mathematical Reasoning Validation for AI

    3,500 math problems, three difficulty levels, every solution step checked, not just the final answer. We brought in olympiad students and university instructors to stress-test model logic.

    Lean more

Ready to get started?

Tell us what you need — we’ll reply within 24h with a free estimate

    What service are you looking for? *
    What service are you looking for?
    Data Labeling
    Data Collection
    Ready-made Datasets
    Human Moderation
    Medicine
    Other
    What's your budget range? *
    What's your budget range?
    < $1,000
    $1,000 – $5,000
    $5,000 – $10,000
    $10,000 – $50,000
    $50,000+
    Not sure yet
    • United States+1
    • United Kingdom+44
    • Afghanistan (‫افغانستان‬‎)+93
    • Albania (Shqipëri)+355
    • Algeria (‫الجزائر‬‎)+213
    • American Samoa+1684
    • Andorra+376
    • Angola+244
    • Anguilla+1264
    • Antigua and Barbuda+1268
    • Argentina+54
    • Armenia (Հայաստան)+374
    • Aruba+297
    • Australia+61
    • Austria (Österreich)+43
    • Azerbaijan (Azərbaycan)+994
    • Bahamas+1242
    • Bahrain (‫البحرين‬‎)+973
    • Bangladesh (বাংলাদেশ)+880
    • Barbados+1246
    • Belarus (Беларусь)+375
    • Belgium (België)+32
    • Belize+501
    • Benin (Bénin)+229
    • Bermuda+1441
    • Bhutan (འབྲུག)+975
    • Bolivia+591
    • Bosnia and Herzegovina (Босна и Херцеговина)+387
    • Botswana+267
    • Brazil (Brasil)+55
    • British Indian Ocean Territory+246
    • British Virgin Islands+1284
    • Brunei+673
    • Bulgaria (България)+359
    • Burkina Faso+226
    • Burundi (Uburundi)+257
    • Cambodia (កម្ពុជា)+855
    • Cameroon (Cameroun)+237
    • Canada+1
    • Cape Verde (Kabu Verdi)+238
    • Caribbean Netherlands+599
    • Cayman Islands+1345
    • Central African Republic (République centrafricaine)+236
    • Chad (Tchad)+235
    • Chile+56
    • China (中国)+86
    • Christmas Island+61
    • Cocos (Keeling) Islands+61
    • Colombia+57
    • Comoros (‫جزر القمر‬‎)+269
    • Congo (DRC) (Jamhuri ya Kidemokrasia ya Kongo)+243
    • Congo (Republic) (Congo-Brazzaville)+242
    • Cook Islands+682
    • Costa Rica+506
    • Côte d’Ivoire+225
    • Croatia (Hrvatska)+385
    • Cuba+53
    • Curaçao+599
    • Cyprus (Κύπρος)+357
    • Czech Republic (Česká republika)+420
    • Denmark (Danmark)+45
    • Djibouti+253
    • Dominica+1767
    • Dominican Republic (República Dominicana)+1
    • Ecuador+593
    • Egypt (‫مصر‬‎)+20
    • El Salvador+503
    • Equatorial Guinea (Guinea Ecuatorial)+240
    • Eritrea+291
    • Estonia (Eesti)+372
    • Ethiopia+251
    • Falkland Islands (Islas Malvinas)+500
    • Faroe Islands (Føroyar)+298
    • Fiji+679
    • Finland (Suomi)+358
    • France+33
    • French Guiana (Guyane française)+594
    • French Polynesia (Polynésie française)+689
    • Gabon+241
    • Gambia+220
    • Georgia (საქართველო)+995
    • Germany (Deutschland)+49
    • Ghana (Gaana)+233
    • Gibraltar+350
    • Greece (Ελλάδα)+30
    • Greenland (Kalaallit Nunaat)+299
    • Grenada+1473
    • Guadeloupe+590
    • Guam+1671
    • Guatemala+502
    • Guernsey+44
    • Guinea (Guinée)+224
    • Guinea-Bissau (Guiné Bissau)+245
    • Guyana+592
    • Haiti+509
    • Honduras+504
    • Hong Kong (香港)+852
    • Hungary (Magyarország)+36
    • Iceland (Ísland)+354
    • India (भारत)+91
    • Indonesia+62
    • Iran (‫ایران‬‎)+98
    • Iraq (‫العراق‬‎)+964
    • Ireland+353
    • Isle of Man+44
    • Israel (‫ישראל‬‎)+972
    • Italy (Italia)+39
    • Jamaica+1876
    • Japan (日本)+81
    • Jersey+44
    • Jordan (‫الأردن‬‎)+962
    • Kazakhstan (Казахстан)+7
    • Kenya+254
    • Kiribati+686
    • Kosovo+383
    • Kuwait (‫الكويت‬‎)+965
    • Kyrgyzstan (Кыргызстан)+996
    • Laos (ລາວ)+856
    • Latvia (Latvija)+371
    • Lebanon (‫لبنان‬‎)+961
    • Lesotho+266
    • Liberia+231
    • Libya (‫ليبيا‬‎)+218
    • Liechtenstein+423
    • Lithuania (Lietuva)+370
    • Luxembourg+352
    • Macau (澳門)+853
    • Macedonia (FYROM) (Македонија)+389
    • Madagascar (Madagasikara)+261
    • Malawi+265
    • Malaysia+60
    • Maldives+960
    • Mali+223
    • Malta+356
    • Marshall Islands+692
    • Martinique+596
    • Mauritania (‫موريتانيا‬‎)+222
    • Mauritius (Moris)+230
    • Mayotte+262
    • Mexico (México)+52
    • Micronesia+691
    • Moldova (Republica Moldova)+373
    • Monaco+377
    • Mongolia (Монгол)+976
    • Montenegro (Crna Gora)+382
    • Montserrat+1664
    • Morocco (‫المغرب‬‎)+212
    • Mozambique (Moçambique)+258
    • Myanmar (Burma) (မြန်မာ)+95
    • Namibia (Namibië)+264
    • Nauru+674
    • Nepal (नेपाल)+977
    • Netherlands (Nederland)+31
    • New Caledonia (Nouvelle-Calédonie)+687
    • New Zealand+64
    • Nicaragua+505
    • Niger (Nijar)+227
    • Nigeria+234
    • Niue+683
    • Norfolk Island+672
    • North Korea (조선 민주주의 인민 공화국)+850
    • Northern Mariana Islands+1670
    • Norway (Norge)+47
    • Oman (‫عُمان‬‎)+968
    • Pakistan (‫پاکستان‬‎)+92
    • Palau+680
    • Palestine (‫فلسطين‬‎)+970
    • Panama (Panamá)+507
    • Papua New Guinea+675
    • Paraguay+595
    • Peru (Perú)+51
    • Philippines+63
    • Poland (Polska)+48
    • Portugal+351
    • Puerto Rico+1
    • Qatar (‫قطر‬‎)+974
    • Réunion (La Réunion)+262
    • Romania (România)+40
    • Russia (Россия)+7
    • Rwanda+250
    • Saint Barthélemy+590
    • Saint Helena+290
    • Saint Kitts and Nevis+1869
    • Saint Lucia+1758
    • Saint Martin (Saint-Martin (partie française))+590
    • Saint Pierre and Miquelon (Saint-Pierre-et-Miquelon)+508
    • Saint Vincent and the Grenadines+1784
    • Samoa+685
    • San Marino+378
    • São Tomé and Príncipe (São Tomé e Príncipe)+239
    • Saudi Arabia (‫المملكة العربية السعودية‬‎)+966
    • Senegal (Sénégal)+221
    • Serbia (Србија)+381
    • Seychelles+248
    • Sierra Leone+232
    • Singapore+65
    • Sint Maarten+1721
    • Slovakia (Slovensko)+421
    • Slovenia (Slovenija)+386
    • Solomon Islands+677
    • Somalia (Soomaaliya)+252
    • South Africa+27
    • South Korea (대한민국)+82
    • South Sudan (‫جنوب السودان‬‎)+211
    • Spain (España)+34
    • Sri Lanka (ශ්‍රී ලංකාව)+94
    • Sudan (‫السودان‬‎)+249
    • Suriname+597
    • Svalbard and Jan Mayen+47
    • Swaziland+268
    • Sweden (Sverige)+46
    • Switzerland (Schweiz)+41
    • Syria (‫سوريا‬‎)+963
    • Taiwan (台灣)+886
    • Tajikistan+992
    • Tanzania+255
    • Thailand (ไทย)+66
    • Timor-Leste+670
    • Togo+228
    • Tokelau+690
    • Tonga+676
    • Trinidad and Tobago+1868
    • Tunisia (‫تونس‬‎)+216
    • Turkey (Türkiye)+90
    • Turkmenistan+993
    • Turks and Caicos Islands+1649
    • Tuvalu+688
    • U.S. Virgin Islands+1340
    • Uganda+256
    • Ukraine (Україна)+380
    • United Arab Emirates (‫الإمارات العربية المتحدة‬‎)+971
    • United Kingdom+44
    • United States+1
    • Uruguay+598
    • Uzbekistan (Oʻzbekiston)+998
    • Vanuatu+678
    • Vatican City (Città del Vaticano)+39
    • Venezuela+58
    • Vietnam (Việt Nam)+84
    • Wallis and Futuna (Wallis-et-Futuna)+681
    • Western Sahara (‫الصحراء الغربية‬‎)+212
    • Yemen (‫اليمن‬‎)+967
    • Zambia+260
    • Zimbabwe+263
    • Åland Islands+358
    Where did you hear about Unidata? *
    Where did you hear about Unidata?
    Andrew
    Head of Client Success

    — I'll guide you through every step, from your first
    message to full project delivery

    Thank you for your
    message

    It has been successfully sent!

    We use cookies to enhance your experience, personalize content, ads, and analyze traffic. By clicking 'Accept All', you agree to our Cookie Policy.