The Task
Biometric authentication sounds simple until you try to scale it.
The client was building an AI-powered biometric system for banking terminals, designed to ensure secure customer authentication and payment processing through fingerprint recognition. To train and validate the underlying biometric algorithms, the project required a large, high-quality dataset consisting of 20,000 palm image sets.
Each set had to include six strictly standardized photos, three taken with the front camera and three with the camera on a mobile phone.
At this scale, broad geographic coverage and device diversity were also critical. At the same time, we had to keep the cost within a fixed budget, which turned the project into an exercise in precise traffic and workflow optimization.
The Solution
Multi-platform strategy
We approached sourcing as a controlled system rather than a single-channel launch.
Data collection was distributed across several international platforms to balance speed, diversity, and cost. Prolific became the primary source due to its stable participant flow, high response quality, and flexible filtering capabilities.
At the same time, we continuously monitored platform performance and redistributed traffic when needed to avoid slowdowns or quality drops. This allowed us to maintain a consistent collection pace without overloading a single source.
Process organization and quality control
At this volume, most risks come from small inconsistencies repeated thousands of times.
To minimize this, we designed a structured capture flow:
- detailed step-by-step instructions with visual examples
- clear requirements for framing, positioning, and lighting
- device-specific clarifications where necessary
We complemented this with automated validation at the upload stage:
- format and completeness checks
- basic quality filters such as resolution and alignment
- instant rejection of invalid submissions
Quality control was not treated as a final step. We monitored incoming data in real time, conducted regular sampling, and tracked recurring errors.
When patterns appeared, we adjusted instructions and task logic, reducing future error rates instead of only filtering results afterward.
Audience management and filtering
Prolific’s filtering capabilities became a key control mechanism.
We used them not only to match basic criteria, but to stabilize the entire pipeline:
- selecting participants with suitable devices
- prioritizing users with strong task history
- balancing geographic distribution
This helped maintain consistently high upload quality and predictable throughput, while reducing noise and rework.
| Stage | Input | Workflow Scope | Main Quality Checks |
|---|---|---|---|
| Participant Sourcing | Platform traffic, targeting rules | Multi-platform launch with Prolific as main source | Demographic balance, device diversity |
| Photo Collection | Raw palm photo sets | Collection of 6 mandatory images per participant | Angle correctness, lighting, focus |
| Primary Validation | Uploaded photo sets | Assessor review of metadata and visual criteria | Completeness, instruction compliance |
| Quality Control (QC) | Validated sets | Daily sampling, consistency checks, feedback loops | Error rate, assessor accuracy |
| Dataset Assembly | Approved photo sets + metadata | Structuring IDs, metadata files, packaging | Structural integrity, format compliance |
| Deduplication | Prepared dataset | Automated duplicate detection across releases | Uniqueness, release integrity |
The Results
- Collected 20,000 palm sets and launched the next batch of 20,000
- Delivered a fully verified dataset ahead of schedule with consistently high quality
- Scaled the process while keeping predictable speed and global coverage, with unified standards regardless of region, device or platform
Large-scale biometric datasets are built on process discipline, not volume alone. Stable quality emerges when sourcing, instructions, validation, and deduplication work as a single continuous pipeline rather than isolated steps.
- Hanna Parkhots
- Data Collection Team Lead