Skip to main content
Unlisted page
This page is unlisted. Search engines will not index it, and only users having a direct link can access it.

Playground vs Evals

Both Playground and Evals run work in Spec27, but they solve different jobs.

Use Playground when you want exploration

Playground is best when you want to:

  • try a configuration quickly
  • inspect a run history for experimentation
  • generate derivative or adversarial datasets from an existing flow
  • validate that a setup is ready before using it more formally

Use Evals when you want repeatability

Evals are best when you want to:

  • save a named evaluation setup
  • attach one or more agents to one or more specifications
  • run the same setup again later
  • review persisted results in project result views

Side-by-side rule of thumb

Use thisWhen you need
PlaygroundA fast loop for validation, experimentation, or derivative dataset generation
EvalsA saved, reusable setup that teams can rerun and review later

A practical decision

  • Start in Playground when you are still probing whether the setup behaves the way you expect.
  • Move to Evals when you want the configuration to become a reusable team workflow.