Skip to main content

Troubleshooting

A run did not start

Check:

  • the eval has at least one linked agent
  • the eval has at least one linked specification
  • any dependent datasets or preparation steps are complete
  • organization usage limits have not blocked the work

Agent preview is blocked

Check:

  • whether the agent requires missing Secrets
  • whether the secret key names match what the agent expects

I do not see an asset I expected to see

Check:

  • the active organization
  • whether the asset belongs to a different project
  • whether ownership or visibility rules are filtering it out

A specification is stuck in preparing or failed

Check:

  • whether the primary dataset is valid and complete
  • whether the selected attack methods still make sense for the workflow
  • whether the failure state on the specification detail page gives a specific error
  • whether retrying preparation is available from the detail page

A judge-based score looks wrong

Check:

  • the judge configuration
  • the selected built-in judge version
  • any shared context you added
  • the sample input and output you used during judge testing
  • whether judge-based scoring is the right method for the task

My results are hard to interpret

Start with:

  • run status
  • latest step
  • per-row correctness
  • console output or error details

Then compare the run setup back to the eval and specification.

I hit a usage limit sooner than expected

Check:

  • the current organization plan
  • consumed versus reserved units
  • whether previews, specification preparation, or queued runs have already allocated usage in the current week