Home Case Studies Advanced Message Filtering for Platform Safety

NLP Annotation services

Advanced Message Filtering for Platform Safety

When user trust is at stake, platforms can’t afford to let harmful messages slip through.

A major classifieds company needed a reliable way to protect conversations on their platform — without slowing them down. To make that happen, Unidata provided high-precision annotation and validation for a smart filtering and classification system that now helps keep millions of daily interactions safe and respectful.

Client Request

Our client, a leading company in the classifieds industry, aimed to build a message filtering system that would:

Prevent the spread of inappropriate or restricted content
Improve overall conversation quality on the platform
Protect users from violations such as:
- Offensive or abusive language
- Personal data disclosure
- Negative or harmful speech

To achieve this, Unidata was brought in to annotate and validate the dataset, providing the foundation for a model that could reliably detect and categorize sensitive content.

Our Approach

Technical Requirements and Pilot Phase

The client provided a detailed technical brief outlining classification requirements. Our team proposed additional refinements to ensure a more precise and layered annotation process.

During the pilot phase, we collaborated closely with the client to:

Clarify classification rules for key categories, including:
- Insults and abusive language
- Mentions of personal information
- References to meeting arrangements
- Negative sentiment directed at the platform
Address complex edge cases, such as:
- Implicit mentions of meeting locations (e.g., vague geographic references without full addresses)

Annotation and Quality Control Process

Our annotation team at Unidata handled classification by carefully considering:

Platform-specific communication patterns
Informal language use typical in peer-to-peer messaging
The context of each message, not just isolated phrases

Messages were annotated across several primary categories:

Use of profanities or slurs
Disclosure of personal or sensitive information
Various forms of direct and indirect insults
Mentions of meeting points or negotiation outside the platform

Data Validation

To ensure the highest level of annotation accuracy, we implemented a robust validation workflow:

Involved experienced validators to review annotated samples
Introduced an interactive error analysis process, which included:
- Team discussions of edge cases
- Targeted surveys to refine judgment on difficult categories

We also conducted training and testing sessions with annotators focused on:

Eliminating errors in high-complexity cases
Aligning the team on annotation logic and edge-case handling
Ensuring consistent interpretation of classification criteria

Stage	Input	Workflow Scope	Main Quality Checks
Project Setup	Client guidelines & chat data	Instruction review, clarification, tone alignment	Guideline clarity / linguistic consistency
Pilot Phase	Sample conversations	Testing annotation logic, resolving edge cases	Tone accuracy / ambiguity reduction
Annotation	Chat messages & reply suggestions	Labeling relevance, safety, tone, grammar	Context alignment / toxicity filtering
Linguistic Control	Annotated responses	Informal style, natural phrasing validation	Fluency / conversational realism
Validation & QA	Annotated batches	Sampling, validator review, escalation of edge cases	Accuracy / policy compliance
Feedback Loop	QA results	Performance tracking, annotator feedback	Error reduction / consistency
Training & Support	Validators	Ongoing training, targeted improvements	Validator accuracy
Final Delivery	Validated dataset	Packaging and handoff	Dataset readiness / deployment quality

Project Setup & Guideline Alignment

1 week

Pilot Phase & Linguistic Calibration

2 weeks

Annotation & Validation Phase

2 weeks

Final Evaluation & Delivery

1 week

The Results

The model trained on our annotated data was successfully tested and deployed on the client’s platform. Internal testing involved evaluating model performance against randomly selected user messages
The initial testing phase showed promising results: – the model accurately blocked inappropriate or restricted content; – responses remained contextually appropriate across various scenarios

In conversational AI, the hardest part isn’t detecting toxicity. It’s generating responses that are neutral, context-aware, and still sound human. That balance only comes from carefully annotated real dialogue.

Vladislav Barsukov: Head of SLM&LLM Annotation

Similar Cases

Data Collection

Egocentric Data Collection for Humanoid Robot Training

Open egocentric datasets give you 2D video with no depth, no pose, no tactile signal. Humanoid training requires all three. How do you build a multimodal setup that captures what open data structurally cannot?
Lean more
Data Collection

Alopecia Image Collection for Medical Research

How do you capture subtle differences in male hair loss at scale? We collected 350 multi-angle photo sets, labeled with expert precision using the Norwood Scale.
Lean more
Audio Transcription

Multi-Speaker Audio Annotation for Banking

We handled complex, real-world audio by combining automation with expert oversight — capturing every voice, pause, and interruption.
Lean more
Geospatial Annotation services

LiDAR Annotation for Robotics

City streets in 3D: thousands of objects, overlapping geometries, no margin for misclassification. 3,000 point clouds processed in 19 days at 99% accuracy. What does it take to make raw spatial data reliable enough for robotics?
Lean more
Image Annotation

License Plate Annotation for Vehicle Recognition System

How do you annotate 100,000 license plates with dozens of nuances — from Arabic characters to regional codes — and still meet a two-week deadline?
Lean more

Ready to get started?

Tell us what you need — we’ll reply within 24h with a free estimate

What service are you looking for? *

What service are you looking for?

Data Labeling

AI Model Testing

Data Collection

Ready-made Datasets

Human Moderation

Medicine

Other

What's your budget range? *

What's your budget range?

< $5,000

$5,000 – $25,000

$25,000 – $50,000

$50,000 – $100,000

$100,000+

Not sure yet

Where did you hear about Unidata? *

Where did you hear about Unidata?

Google LinkedIn Kaggle / Hugging Face / Github Referral (colleague, partner, client) G2 ChatGPT / AI assistant Other

I agree to the Terms of Service and Privacy Policy. By submitting my contact information, I consent to receive emails, messages, and calls from Unidata and its affiliates.

Andrew: Head of Client Success

— I'll guide you through every step, from your first
message to full project delivery

Thank you for your
message

It has been successfully sent!

We use cookies to enhance your experience, personalize content, ads, and analyze traffic. By clicking 'Accept All', you agree to our Cookie Policy.

Advanced Message Filtering for Platform Safety

Client Request

Our Approach

Technical Requirements and Pilot Phase

Annotation and Quality Control Process

Data Validation

The Results

Similar Cases

Egocentric Data Collection for Humanoid Robot Training

Alopecia Image Collection for Medical Research

Multi-Speaker Audio Annotation for Banking

LiDAR Annotation for Robotics

License Plate Annotation for Vehicle Recognition System

Ready to get started?

Thank you for your message

Ready to get started?

Thank you for your
message