Task
A client needed to process surveillance footage from a factory entrance to enable automatic employee identification and matching with an access control system.
The dataset included video from three camera angles:
- two cameras inside the entrance area
- one monitoring the exit
Goal:
Transform raw surveillance video into a structured dataset for:
- person detection
- identity matching (ID linkage)
Key challenges:
- excessive volume of irrelevant frames
- inaccuracies in neural network pre-annotation
- need for precise alignment between visual data and employee IDs
Solution
01. Video Preprocessing & Frame Reduction
Raw footage contained a large amount of non-informative data.
We introduced a filtering stage:
- removed up to 80% of irrelevant frames
- reduced dataset size from 50–60K to ~8K frames
This step increased efficiency and improved overall dataset quality.
02. Neural Pre-annotation with Manual Refinement
We combined automation with human validation:
- neural network used for initial person detection
- manual correction of false positives
- precise adjustment of bounding boxes
This hybrid approach balanced speed with accuracy.
03. Automated ID Matching Integration
To connect visual data with identity data, we:
- developed a script to match employee IDs
- aligned annotations with access control system records
This transformed the dataset from simple detection into a usable identification pipeline.
04. Validation & Quality Control
A dedicated validation stage ensured consistency:
- verification of pre-annotation outputs
- correction of detection errors
- refinement of object boundaries
Special focus was placed on alignment between detected individuals and assigned IDs.
| Stage | Input | Workflow Scope | Main Quality Checks |
|---|---|---|---|
| Video Preprocessing | Raw surveillance footage | Frame filtering, data reduction | Relevance of frames / noise reduction |
| Frame Extraction | Filtered video | Selection of usable frames | Frame quality / coverage |
| Pre-annotation | Extracted frames | Neural network-based person detection | Detection accuracy / false positives |
| Manual Refinement | Pre-annotated data | Correction and bounding box adjustment | Boundary precision / consistency |
| ID Matching | Annotation + ID data | Automated linking of employees to detections | ID alignment accuracy |
| Validation & QA | Final dataset | Multi-stage verification and refinement | Consistency / identity matching quality |
| Final Delivery | Completed dataset | Packaging and integration readiness | System compatibility |
The Results
- Frame volume reduced by ~80% (from 50–60K to ~8K)
- Faster annotation workflow due to pre-annotation
- Improved accuracy through filtering and manual refinement
- Reliable dataset for employee detection and ID matching
In surveillance data, more frames don’t mean better results. The real impact comes from filtering noise, focusing on relevant moments, and ensuring every annotation aligns with identity data.
- Roman Lukoshin
- Speech and Generative Data Manager