Welcome to Spec27

Spec27 is a shared workspace for evaluating LLM-powered agents. It takes you from one-off prompt checks to reusable evaluation assets, repeatable runs, and results that stay tied to the exact setup that produced them.

These docs focus on the product workflow you use in the app, not the internal implementation.

What you can do in Spec27

Create shared Projects for a product area or evaluation stream.
Connect Agents — by describing them, copying a prebuilt integration, or writing the code.
Prepare Specifications — the central authoring unit where you author test entries, define evaluation methods, configure scoring, and apply attack methods for adversarial coverage.
Measure robustness by applying attack methods and comparing clean and robust accuracy.
Evaluate behaviour over a conversation with multi-turn evaluation.
Run repeatable Evals and review Results over time.

The core workflow

Most teams move through Spec27 in this order:

Join or create an organization, then create a Project.
Connect an Agent and add any required Secrets.
Create a Specification and author its test entries, evaluation method, and scoring config.
Add adversarial coverage to your Specification if needed.
Create an Eval that pairs your Agents with your Specifications, and run it.
Review Results, compare clean and robust performance, and iterate.

A specification separates what you are preparing from what has been run: saving a specification creates an immutable version, and evals run a pinned version, so past results never change meaning when you edit the draft later.

Gold Team and Red Team

Gold Team work focuses on desirable behaviour, correctness, and robustness — including goal-based multi-turn tasks.
Red Team work focuses on misuse, harmfulness, jailbreaks, and failure-seeking evaluation — including multi-turn adversarial probing.

Both use the same core asset model but differ in specification type, attack coverage, and scoring.

Choose your starting path

New to the product? Read What Spec27 helps you do.
Want a guided walkthrough? Read Onboarding example.
Want to build your own setup? Read Quickstart.
Want the asset model first? Read Mental model.

What you can do in Spec27​

The core workflow​

Gold Team and Red Team​

Choose your starting path​

Recommended reading sequence​

What you can do in Spec27

The core workflow

Gold Team and Red Team

Choose your starting path

Recommended reading sequence