Welcome to Spec27
Spec27 gives you a shared workspace for evaluating LLM-powered agents. You can use it to move from one-off prompt checks to reusable evaluation assets, repeatable runs, and results that stay connected to the setup that produced them.
These docs focus on the product workflow you use in the app, not the internal implementation.
What you can do in Spec27
- Create shared Projects for a product area, workflow, or evaluation stream.
- Store test cases in Datasets.
- Save runnable behavior as Agents.
- Add Secrets when agents need protected values.
- Configure Judges when correctness requires interpretation.
- Define reusable Specifications for Gold Team or Red Team work.
- Create repeatable Evals and optional schedules.
- Review Runs, Results, logs, and usage over time.
The core workflow
Most teams move through Spec27 in this order:
- Join or create an organization.
- Create a Project.
- Add Datasets, Agents, and any required Secrets.
- Create Judges if you need judge-based scoring.
- Create a Specification to define the test inputs and attack methods.
- Create an Eval that connects your Agents and Specifications.
- Use Playground for quick runs or start a full eval run.
- Review Results and iterate.
Gold Team and Red Team
- Gold Team work focuses on desirable behavior, correctness, and robustness.
- Red Team work focuses on misuse, harmfulness, jailbreaks, and failure-seeking evaluation.
- Both flows use the same core asset model, but they differ in specification type, attack methods, and scoring.
Choose your starting path
- If you want the product overview first, read What Spec27 helps you do.
- If you want a guided walkthrough, read Onboarding example.
- If you want to build your own setup, read Quickstart.
- If you want the asset model first, read Mental model.