Video Pose Estimation for Proctoring
We helped an education technology company create a dataset to detect suspicious student behavior during exams by accurately annotating keypoints in 6000 video frames. This allowed AI models to monitor body movements and posture in real time, supporting automated exam proctoring.
Task
The client needed video data annotated with human pose keypoints to train models capable of identifying behaviors such as looking away from the screen, leaning toward neighbors, or leaving the frame.
Challenges included:
- Multiple students per frame with overlapping limbs and furniture.
- Variations in posture, occlusions, and partial visibility.
- Short turnaround time to meet the client’s development schedule.
Solution
Iterative Annotation Workflow
The project was divided into three batches of 2000 frames. The first batch was fully manually annotated to establish a high-quality baseline. Subsequent batches were pre-annotated using the client’s tools, then reviewed and corrected by our team, reducing annotation time by up to 40% while maintaining consistency.
Handling Complex Poses in Crowded Settings
Strict internal guidelines ensured precise placement of keypoints even with occlusions, overlapping limbs, and diverse postures. This high granularity was critical for downstream model training.
Team Training and Domain Immersion
Annotators completed specialized training, including studying anatomical references, reviewing client exam footage, and weekly QA sessions to resolve edge cases. This preparation enabled accurate recognition of subtle posture variations and movement patterns.
| Stage | Input | Workflow Scope | Main Quality Checks |
|---|---|---|---|
| Requirements Alignment | Client goals, exam video footage | Definition of keypoints and behavior scenarios | Clarity, edge cases, feasibility |
| Guidelines Development | Sample frames, pose references | Annotation rules for occlusions, overlapping limbs | Consistency, anatomical correctness |
| Annotator Training | Guidelines, reference materials | Training on pose estimation, calibration tasks | Keypoint accuracy, readiness |
| Video Annotation | Exam video sequences | Frame-by-frame keypoint annotation, multi-person tracking | Temporal consistency, precision |
| Iterative Validation | Annotated batches | Review, correction, integration of pre-annotations | Error reduction, consistency |
| Final QA | Validated dataset | Dataset consolidation and delivery | Completeness, client acceptance |
The Results
- 6000 frames annotated within 3 months, including verification and correction cycles.
- Each batch delivered on time, supporting the client’s agile development process.
- High-quality dataset improved pose detection accuracy, enabling more effective automated proctoring.
Pose estimation in video requires precise tracking of keypoints across frames and consistent handling of occlusions and multi-person scenes. Model performance depends on temporal consistency, clear annotation rules, and iterative quality control.
- Roman Lukoshin
- Speech and Generative Data Manager