Skip to main content

Specifications

Use Specifications to define the reusable evaluation recipe for a task.

Before You Begin

  • You have a project.
  • You have at least one primary dataset.
  • You know whether you want attack methods, adversarial datasets, or judge-based scoring.

What a Specification Can Include

A specification can include:

  • a primary dataset
  • attack methods
  • adversarial dataset selections
  • an evaluation method
  • optional judge configuration
  • additional context
  • an execution mode

Specifications also carry a team type:

  • Gold Team
  • Red Team

Create a Specification

  1. Open Specs inside a project.
  2. Choose the team flow first.
  3. Choose the primary dataset you want to test.
  4. Set the evaluation method:
    • strict equality
    • permitted values
    • judge
  5. Add any attack methods you want Spec27 to use.
  6. If relevant, include adversarial dataset selections.
  7. If the workflow is judge-based, choose the judge configuration that should score outputs.
  8. Save the specification and open the detail page.
  9. Review the status, datasets, linked eval usage, and results summary.

Review Preparation Status and Execution Mode

Specification status can move through:

  • Preparing
  • Ready
  • Failed

Execution mode can be:

  • Batch
  • Iterative

Check the specification detail page after saving so you can confirm that the configuration is valid and ready to use in an eval.

Reuse a Specification Across Runs

A specification describes what should be tested and how it should be scored. It is not the run itself.

Because of that separation, you can reuse one specification across multiple evals and multiple runs without recreating the evaluation recipe each time.

Important Notes

  • A specification can be reused across multiple evals.
  • A specification can include multiple attack methods for the same primary dataset.
  • Plain-language definition: a specification is the evaluation recipe, not the run itself.
  • Red-team specifications use judge-based evaluation.