Playground vs Evals
Both Playground and Evals run work in Spec27, but they solve different jobs.
Use Playground when you want exploration
Playground is best when you want to:
- try a configuration quickly
- inspect a run history for experimentation
- generate derivative or adversarial datasets from an existing flow
- validate that a setup is ready before using it more formally
Use Evals when you want repeatability
Evals are best when you want to:
- save a named evaluation setup
- attach one or more agents to one or more specifications
- run the same setup again later
- review persisted results in project result views
Side-by-side rule of thumb
| Use this | When you need |
|---|---|
| Playground | A fast loop for validation, experimentation, or derivative dataset generation |
| Evals | A saved, reusable setup that teams can rerun and review later |
A practical decision
- Start in Playground when you are still probing whether the setup behaves the way you expect.
- Move to Evals when you want the configuration to become a reusable team workflow.