Skip to main content

Runs and Results

Use Results to understand what happened during a run and what to change next.

Before You Begin

  • You have started at least one run.

Open a Run Detail Page

  1. Open the run detail page from Evals, Playground, or the project Results area.
  2. Review the run status and latest step.
  3. Check how many results completed successfully or failed.
  4. Inspect per-row outputs, correctness, and any scoring details.
  5. Review console output or error details when present.
  6. Export results when you need a CSV copy for analysis outside Spec27.

Filter and Inspect Results

Depending on the run, you may be able to filter or inspect by:

  • dataset kind
  • category
  • robustness outcome
  • errors only

Judge-based runs can also include:

  • judge explanation
  • judge votes

Use these views to narrow the result set and focus on the cases that matter most, such as failures, specific categories, or robustness issues.

Interpret the Outcome

After reviewing the run detail page, you should be able to answer:

  • whether the run completed successfully
  • which cases passed or failed
  • whether failures cluster around a category or dataset type
  • whether logs or judge explanations point to a clear next step

Important Notes

  • Run detail pages can show output, status, correctness, logs, and related assets together.
  • Judge-based runs may include explanations and vote details.
  • Adversarial runs can surface adversarial inputs instead of only the seeded primary inputs.
  • CSV export is useful when you want external review or analysis.