Conduct ground truth reviews - independently evaluate a representative sample of AI-scored Mid-Market sales call transcripts against defined quality skill rubrics.
Support diagnostic deep-dives - participate in targeted investigations on calls flagged by the governance system for potential scoring issues.
Calibrate regularly - complete calibration exercises using anchor examples (clear PASS, clear FAIL, borderline) to maintain scoring consistency across the team.
Track inter-rater reliability - log review outcomes in the governance review tool and flag disagreements with AI scores for resolution.
Categorise disagreements by governance pillar - classify scoring discrepancies across Score Accuracy, Attribution Integrity, Bias & Fairness, Consistency, Coverage, and Drift to identify systemic patterns.
Feed findings into prompt refinement - document and communicate patterns in AI scoring errors to support the prompt engineering cycle and improve model performance.
Expand the ground truth dataset - contribute reviewed and validated call assessments to the growing ground truth corpus used for model benchmarking and governance reporting.
Required Qualifications
5 years of experience in QA, auditing, or quality review - ideally in a sales, contact centre, or customer-facing environment.
Demonstrated analytical skills with a strong attention to detail and ability to apply rubrics consistently.
Experience evaluating sales conversations (calls, transcripts, or recordings) against structured quality frameworks.
Comfort working with AI-generated outputs - ability to critically assess automated scores rather than default to them.
Strong written communication skills for documenting findings and flagging patterns clearly.
Self-directed and reliable in a remote working environment with minimal supervision.
Preferred Qualifications
Experience with LLM evaluation, AI output review, or human-in-the-loop quality processes.
Experience in Quality Assurance, performance assessments etc.
Background in sales coaching, enablement, or sales methodology (e.g., MEDDIC, Challenger, SPIN).
Proficiency in multiple languages relevant to EMEA Mid-Market sales.
Experience with inter-rater reliability measurement or calibration frameworks