Specifications

A specification is the central authoring unit in Spec27. It brings together everything needed to evaluate a behaviour: the primary entries, the evaluation method, any scoring configuration, and the adversarial coverage. You prepare a specification once and reuse it across many evals.

Choose a specification type

When you create a specification, you first pick the team flow on the Choose Specification Type page so the form stays focused:

Gold Team — desirable behaviour, correctness, and robustness.
Red Team — misuse, failure modes, and attack-oriented evaluation.
Goal Gold Team — simulator-driven multi-turn goal completion.

Author the specification

The create form is organised into sections:

Specification details — a Name and optional Additional context that evaluators should apply. The type is fixed once chosen (with a Change type link if you picked wrong).
Entries — author, import, or generate test case entries in the Entries tab. These entries define the scope of the spec. Red Team specs include a PII attestation for the entries. Use dataset generation to help populate your entries.
Evaluation settings — choose how outputs are scored:
- Evaluation method — Strict equality, Permitted values, or Judge. For Permitted values you list the accepted values, one per line. Red Team is always Judge scored.
- Execution mode — Batch or Iterative, when applicable.
- Judge — the judge configuration to use when the method is Judge. See evaluation methods for scoring options.
- For goal-based and red-team multi-turn specs, a User Simulator and Max turns that bound the conversation. Some Red Team attack methods, such as Adaptive LotL, generate follow-up turns without the selected user simulator.
Attack coverage — define adversarial variants:
- Gold Team specs select individual attack methods, grouped by category. Leave them unchecked to run only the primary entries, with no robustness score.
- Red Team specs pick an attack suite — Light Suite or Heavy Suite — instead of individual methods.

Select Create Specification to save.

Editing, versions, and runs

A specification separates what you are preparing from what has been run:

Editing prepares a draft. Updating the name, entries, evaluation settings, or attack coverage changes the working draft of the specification.
Saving makes a version runnable. Saved versions are immutable snapshots of the spec at that moment.
Evals run a pinned version. An eval stays pinned to the version it was set up with, so later edits to the draft do not change the meaning of past runs. When you want an eval to pick up your latest changes, you bump it to the new version from the spec.

This is what lets you reason about "the spec that was run" after the fact — historical results stay tied to the exact configuration that produced them.

Track preparation status

After saving, the specification detail page shows its preparation status:

Preparing — Spec27 is preparing the specification (including any adversarial coverage).
Ready — the specification can be used in eval runs.
Failed — preparation hit an error; use Retry preparation to try again.

The detail page also shows the evaluation method, selected entries, selected attack methods, additional context, and a link to results.

Choose a specification type​

Author the specification​

Editing, versions, and runs​

Track preparation status​

Related pages​

Choose a specification type

Author the specification

Editing, versions, and runs

Track preparation status

Related pages