Task
The client required a dataset that reflects how facial features evolve throughout childhood and early adolescence. Core requirements included:
- Accurate age verification for every image
- Diversity across ethnicity, geography, and gender
- Year-by-year continuity, allowing models to distinguish natural growth from identity mismatch
Key Challenges
Ensuring Age and Identity Consistency
- Verifying real ages without access to official identity documents
- Covering multiple regions with different cultural and photographic conditions
- Limited availability of high-quality images of children
- Ensuring each photo set belonged to the same individual and matched the declared age
Solution
Dataset design and methodology
- Defined the target age range and prioritized ethnic and regional groups
- Developed an age-verification approach combining visual assessment and metadata analysis
- Created clear, standardized instructions for participants and crowd platforms, including capture examples
Data collection
- Leveraged established crowd platforms and tested new sources to expand geographic coverage
- Designed simple, engaging tasks to encourage complete and high-quality photo sets
- Provided fair compensation to reduce drop-off and incomplete submissions
- Monitored incoming data in real time to address quality issues early
Validation and quality control
- Combined automated checks with manual expert review to confirm age and photo ownership
- Applied multi-layer validation, with multiple reviewers cross-checking each submission
- Minimized inconsistencies and labeling errors, achieving a very low inaccuracy rate
- Delivered a clean, production-ready dataset suitable for model training and research
| Stage | Input | Workflow Scope | Main Quality Checks |
|---|---|---|---|
| Project Setup | Client platform & task requirements | Integration, task flow design, access configuration | System connectivity / Task logic consistency |
| Participant Onboarding | Contributor pool | Recruitment, onboarding, instruction delivery | Participant diversity / Instruction clarity |
| Attack Execution | User devices, printed images, replay materials | Print & replay attacks, iterative submissions | Attack variability / Scenario realism |
| Behavior Tracking | Attack attempt data | Tracking attempts, repeat participation, outcome logging | Data completeness / Behavioral consistency |
| Validation & Analysis | Collected attack data | System scoring review, performance analysis | Result consistency / Attack success evaluation |
| Reporting & Iteration | Validated attack datasets | Weekly reporting, feedback loops, system improvement tracking | Trend accuracy / Continuous performance alignment |
The Results
- Achieved high confidence in age accuracy and metadata reliability
- Enabled training for face recognition, anti-fraud systems, and academic research
- Identified consistent patterns of facial development across diverse ethnic and regional groups
Biometric spoofing resilience is built through repeated real-world attack attempts, not static datasets. System performance improves when diverse participants continuously test its limits under varied conditions.
- Hanna Parkhots
- Data Collection Project Manager