Client Request
The client aimed to enhance user experience by implementing a system that could automatically and accurately identify the product model mentioned in listing titles and descriptions. The ultimate goal was to help buyers quickly compare relevant offers by grouping similar listings under unified product cards.
To achieve this, the platform engaged Unidata for high-quality data annotation — and we got to work.
Our Approach
-
- 01
-
Technical Scope and Pilot Phase
The client provided detailed guidelines for identifying product models from listing text.
Our team reviewed the instructions and proposed refinements, including:- How to handle product attributes (e.g., color or storage capacity) when they appeared in titles but weren’t part of the actual model
- How to treat variations in naming conventions across different product categories
During the pilot phase, key challenges included:
- Multilingual listings
- Numerous abbreviations and non-standard formatting
- Ambiguities requiring client clarification
-
- 02
-
Annotation and Review Process
Over the course of two months, our team annotated 20,000 listings, focusing on precise model identification. Key challenges we addressed included:
- Identifying relevant model keywords in long and often cluttered titles
- Extracting model names from product descriptions, especially in categories like fashion, where listings often contained attributes (e.g., sleeve length, material, color) irrelevant to the model itself
- Standardizing model names across similar listings
To ensure consistency across annotators, we:
- Developed a set of internal rules and examples
- Conducted training sessions to reduce subjective variation
- Implemented continuous review and feedback throughout the annotation phase
-
- 03
-
Validation Workflow
All annotations underwent a thorough validation process to ensure accuracy.
Because model identification involved subjective judgment, we took the following steps:
- Held regular sync meetings to align interpretations
- Updated annotation guidelines based on team feedback and corner cases
- Provided ongoing training and clarification sessions for the team
Validator performance was monitored using analytics to:
- Identify outliers or inconsistencies
- Optimize review efficiency
- Improve overall data quality
Results
The model trained on the annotated dataset was successfully deployed into the production system. Listings are now automatically grouped into product cards based on the identified model
The grouping logic correctly handles edge cases and non-standard listings
Real-user testing conducted by the client confirmed the effectiveness of the model even on complex or ambiguous examples