Agent Evaluator
An agent evaluator is a dedicated agent (or rubric-guided model) that grades the output of other agents against acceptance criteria before it ships.
In plain English
An evaluator is a quality gate inside an agent workflow. After a specialist produces a draft, a patch, or a plan, the evaluator reads it against a rubric — "does the code compile, do the tests pass, does the prose match brand voice, is the claim sourced?" — and returns a pass, a fail, or a revision request. If it fails, the specialist retries with the evaluator's feedback.
The pattern matters because language models are better at judging than generating on narrow criteria. An evaluator with a tight rubric catches sloppy outputs that a single-shot generator would ship. It also gives the whole system a place to accumulate quality knowledge: the rubric is a written artifact the team can edit, review, and test against historical cases.
Why it matters for Black Box
Black Box's Evaluator specialist runs against every production deliverable — landing pages, emails, code changes. Its rubric lives in agent-constitutions/evaluator-rubric.md. Owners can extend it with their own brand and quality criteria.
Examples
- Rejecting a cold-email draft for missing personalization.
- Approving a code patch only if the tests pass and no unrelated files changed.
- Scoring a blog draft on clarity, originality, and citation density before publish.