Mental Model

Spec27 is easiest to understand as a set of reusable assets that come together in a specification. You connect an agent, prepare a specification, run it from an eval, and review the results later — and because saved specifications are immutable, those results stay tied to the exact setup that produced them.

Organization

The organization is the top-level shared workspace. Memberships, roles, invite links, and usage limits live here.

Project

The project is the main container for one evaluation stream. It holds the assets and results that belong together.

Assets inside a project

Agents hold the runnable logic you want to test. You connect them with Agent Builder, a registry integration, or by writing the code.
Secrets provide protected runtime values when an agent needs them.
Specifications are the central authoring unit. A specification brings together its test entries, the evaluation method, judging configuration, and the adversarial coverage from attack methods.
Evals pair agents with specifications into a saved, runnable setup.

You author entries and scoring directly inside the specification. Datasets and judges are internal building blocks that support the specification, but you do not manage them separately.

The specification is the unit of meaning

A specification separates what you are preparing from what has been run:

Editing a specification updates a working draft.
Saving it creates an immutable version.
An eval runs a pinned version, so editing the draft later never changes the meaning of a past run. You bump the eval to a new version when you want it to pick up your changes.

This is what lets you ask "what exactly was run?" after the fact and get a stable answer. See Specifications for the full flow.

Runs and results

A run is a single execution of an eval (or a quick agent preview).
Results are the recorded outputs, scores, statuses, and logs from that run, summarised as clean and robust accuracy.

Gold Team and Red Team use the same workflow

Gold Team focuses on desirable behaviour, correctness, and robustness — including goal-based multi-turn tasks.
Red Team focuses on misuse, harmfulness, jailbreaks, and failure-seeking evaluation — including multi-turn adversarial probing.

Both move through the same chain of assets, specifications, evals, runs, and results.

The workflow at a glance

Organization → Project → Agent + Specification (entries · method · judge · attacks) → Eval → Run → Results

Organization​

Project​

Assets inside a project​

The specification is the unit of meaning​

Runs and results​

Gold Team and Red Team use the same workflow​

The workflow at a glance​

Related pages​