Skip to main content

Judges

Use Judges when correctness depends on interpretation rather than exact matching alone.

Before You Begin

  • You have a project.
  • You know the scoring criteria you want the judge to apply.

What a Judge Is in Spec27

In the current product flow, a user-facing judge is a project-local configuration built on top of a built-in judge version.

That configuration lets you choose:

  • the team flow
  • the built-in judge version
  • optional shared context

When to Use a Judge

Use a judge when an output cannot be evaluated reliably with exact string matching or a fixed list of accepted values.

Judge-based scoring is helpful when you need to evaluate:

  • nuanced correctness
  • policy adherence
  • response quality in context
  • red-team or failure-seeking behavior

Create a Judge

  1. Open Judges inside a project.
  2. Create a judge with a clear title and the right team flow.
  3. Choose the built-in judge version that fits the job.
  4. Add shared context if the judge needs project-specific scoring guidance.
  5. Save the judge and open the detail page.

Test a Judge Before Reusing It

  1. Use the judge test flow with sample input and output.
  2. Review the returned score.
  3. Review the explanation.
  4. Review any vote details that are shown.
  5. Refine the shared context or judge choice if the scoring does not match your intended criteria.

Testing the judge first helps you confirm that the scoring behavior is stable before you attach it to a specification.

Use a Judge in Broader Workflows

After the judge behaves as expected, use it in a judge-based specification or evaluation workflow.

Important Notes

  • Judge tests help validate scoring behavior before you use the judge in a broader workflow.
  • Judge detail pages show model-level information for the selected built-in judge version.
  • Red-team work typically relies on judge-based scoring.
  • Judge-based scoring is best for nuanced correctness checks.