Troubleshooting
A run did not start
Check:
- the eval has at least one linked agent
- the eval has at least one linked specification
- any dependent datasets or preparation steps are complete
- organization usage limits have not blocked the work
Agent preview is blocked
Check:
- whether the agent requires missing Secrets
- whether the secret key names match what the agent expects
I do not see an asset I expected to see
Check:
- the active organization
- whether the asset belongs to a different project
- whether ownership or visibility rules are filtering it out
A specification is stuck in preparing or failed
Check:
- whether the primary dataset is valid and complete
- whether the selected attack methods still make sense for the workflow
- whether the failure state on the specification detail page gives a specific error
- whether retrying preparation is available from the detail page
A judge-based score looks wrong
Check:
- the judge configuration
- the selected built-in judge version
- any shared context you added
- the sample input and output you used during judge testing
- whether judge-based scoring is the right method for the task
My results are hard to interpret
Start with:
- run status
- latest step
- per-row correctness
- console output or error details
Then compare the run setup back to the eval and specification.
I hit a usage limit sooner than expected
Check:
- the current organization plan
- consumed versus reserved units
- whether previews, specification preparation, or queued runs have already allocated usage in the current week