Client Request
Our client, a leading company in the classifieds industry, aimed to build a message filtering system that would:
- Prevent the spread of inappropriate or restricted content
- Improve overall conversation quality on the platform
- Protect users from violations such as:
- Offensive or abusive language
- Personal data disclosure
- Negative or harmful speech
To achieve this, Unidata was brought in to annotate and validate the dataset, providing the foundation for a model that could reliably detect and categorize sensitive content.
Our Approach
Technical Requirements and Pilot Phase
The client provided a detailed technical brief outlining classification requirements. Our team proposed additional refinements to ensure a more precise and layered annotation process.
During the pilot phase, we collaborated closely with the client to:
- Clarify classification rules for key categories, including:
- Insults and abusive language
- Mentions of personal information
- References to meeting arrangements
- Negative sentiment directed at the platform
- Address complex edge cases, such as:
- Implicit mentions of meeting locations (e.g., vague geographic references without full addresses)
Annotation and Quality Control Process
Our annotation team at Unidata handled classification by carefully considering:
- Platform-specific communication patterns
- Informal language use typical in peer-to-peer messaging
- The context of each message, not just isolated phrases
Messages were annotated across several primary categories:
- Use of profanities or slurs
- Disclosure of personal or sensitive information
- Various forms of direct and indirect insults
- Mentions of meeting points or negotiation outside the platform
Data Validation
To ensure the highest level of annotation accuracy, we implemented a robust validation workflow:
- Involved experienced validators to review annotated samples
- Introduced an interactive error analysis process, which included:
- Team discussions of edge cases
- Targeted surveys to refine judgment on difficult categories
We also conducted training and testing sessions with annotators focused on:
- Eliminating errors in high-complexity cases
- Aligning the team on annotation logic and edge-case handling
- Ensuring consistent interpretation of classification criteria
| Stage | Input | Workflow Scope | Main Quality Checks |
| Project Setup | Client guidelines & chat data | Instruction review, clarification, tone alignment | Guideline clarity / linguistic consistency |
| Pilot Phase | Sample conversations | Testing annotation logic, resolving edge cases | Tone accuracy / ambiguity reduction |
| Annotation | Chat messages & reply suggestions | Labeling relevance, safety, tone, grammar | Context alignment / toxicity filtering |
| Linguistic Control | Annotated responses | Informal style, natural phrasing validation | Fluency / conversational realism |
| Validation & QA | Annotated batches | Sampling, validator review, escalation of edge cases | Accuracy / policy compliance |
| Feedback Loop | QA results | Performance tracking, annotator feedback | Error reduction / consistency |
| Training & Support | Validators | Ongoing training, targeted improvements | Validator accuracy |
| Final Delivery | Validated dataset | Packaging and handoff | Dataset readiness / deployment quality |
The Results
- The model trained on our annotated data was successfully tested and deployed on the client’s platform. Internal testing involved evaluating model performance against randomly selected user messages
- The initial testing phase showed promising results: – the model accurately blocked inappropriate or restricted content; – responses remained contextually appropriate across various scenarios
In conversational AI, the hardest part isn’t detecting toxicity. It’s generating responses that are neutral, context-aware, and still sound human. That balance only comes from carefully annotated real dialogue.
- Vladislav Barsukov
- Head of SLM&LLM Annotation