I’ve spent 16 years finding what breaks. Now I engineer quality into systems that fail in new ways: AI-powered products.
Playwright · TypeScript · Python · CI/CD with GitHub Actions
LLM & Agent Evaluation · AI Observability · Test Automation Architecture
Remote Contractor (US Timezones)
AI Quality Engineer & QA Automation Architect with 16+ years of experience building test and evaluation frameworks that ship reliable software — including AI-powered products.
Designs and implements automated test and evaluation pipelines using Playwright, TypeScript, and Python, integrated into CI/CD (GitHub Actions). Applies LLM evaluation techniques including reference-based and reference-free methods, LLM-as-a-judge, RAG evaluation, and adversarial testing to validate non-deterministic AI systems in production.
Known for translating quality risk into engineering solutions: scalable frameworks, observable pipelines, and actionable metrics across web, mobile, backend, and AI-powered platforms.
Building #AITestingJourney — a public R&D log of what actually runs in a production eval pipeline.