Opportunity Description
Key Accountabilities
Definition and execution of testing and quality assurance strategies for AI‑enabled workflowsContinuous evaluation and monitoring of system behavior in production environmentsContribution to auditability, risk management, and continuous quality improvementPrincipal Responsibilities
Define quality criteria and testing strategies for agent workflows, covering accuracy, latency, safety, compliance, and operational riskBuild automated evaluation harnesses to assess agent performance, including hallucination rates, tool misuse, policy violations, and task successImplement continuous production monitoring to detect anomalies, quality degradation, and emerging safety concernsDevelop and maintain automated test suites using Playwright for UI testing and custom scripts for API and workflow validationApply LLM evaluation frameworks to assess output quality, regression, and system dri...
Full time
Computer Occupations