FAQ
What is a specification in plain language?
A specification is the reusable evaluation recipe. It tells Spec27 what dataset to use, what attack methods or adversarial datasets belong with it, and how outputs should be judged.
When should I use Playground instead of Evals?
Use Playground for fast experimentation. Use Evals for named, repeatable workflows.
Do I need a judge for every evaluation?
No. Use strict equality or permitted values when they are sufficient. Add a judge only when evaluation needs interpretation.
Why is my agent preview blocked?
The most common reason is that the agent requires one or more project Secrets that do not exist yet.
What is the difference between Gold Team and Red Team?
- Gold Team focuses on desirable behavior, correctness, and robustness.
- Red Team focuses on misuse, harmfulness, jailbreaks, and failure-seeking evaluation.
Can the same specification be reused?
Yes. A specification can be used by multiple evals.
Is “tagging” the same thing as dataset categories?
In the current product flow, the user-facing concept is category on a dataset entry. Use categories when you want to group related cases for later analysis.
Who can view usage and plan limits?
Usage visibility is meant for organization roles with admin or billing visibility.
Where should I look after a run finishes?
Open the run detail page first, then review the project’s Results views if you want a broader summary.