Web4Guru AI Operations

Agent Evaluator

An agent evaluator is a dedicated agent (or rubric-guided model) that grades the output of other agents against acceptance criteria before it ships.

In plain English

An evaluator is a quality gate inside an agent workflow. After a specialist produces a draft, a patch, or a plan, the evaluator reads it against a rubric — "does the code compile, do the tests pass, does the prose match brand voice, is the claim sourced?" — and returns a pass, a fail, or a revision request. If it fails, the specialist retries with the evaluator's feedback.

The pattern matters because language models are better at judging than generating on narrow criteria. An evaluator with a tight rubric catches sloppy outputs that a single-shot generator would ship. It also gives the whole system a place to accumulate quality knowledge: the rubric is a written artifact the team can edit, review, and test against historical cases.

Why it matters for Black Box

Black Box's Evaluator specialist runs against every production deliverable — landing pages, emails, code changes. Its rubric lives in agent-constitutions/evaluator-rubric.md. Owners can extend it with their own brand and quality criteria.

Examples

  • Rejecting a cold-email draft for missing personalization.
  • Approving a code patch only if the tests pass and no unrelated files changed.
  • Scoring a blog draft on clarity, originality, and citation density before publish.

Related terms