Glossary

Agent

Runnable logic that processes an input.

Attack method

A transformation that creates adversarial variations from primary entries.

Clean accuracy

The percentage of scored primary entries with a correct result.

Credit

A plan unit consumed by metered product work.

Credit usage

Credits reserved or consumed by an organization during its current plan period.

Entry

A single test case or example within a specification that defines an input and expected behavior.

Entry category

An optional label that groups related specification entries for analysis.

Eval

A named evaluation setup that connects agents and pinned specification versions.

Eval run

One execution of an eval across its agents and pinned specification versions.

Gold Team

Evaluation work focused on desirable behavior, correctness, and robustness.

Judge

A scoring configuration used when correctness requires interpretation.

Multi-turn eval

An eval in which an agent and a test exchange more than one turn.

Organization

The top-level shared workspace for members, projects, plan limits, and credit usage.

Pinned specification version

The saved specification version selected by an eval.

Plan

The resource limits and credit allowance assigned to an organization.

Plan limit

The maximum resource count or credit usage allowed by a plan.

Playground

An exploratory surface for trying configurations and generating derivative entries.

Primary entry

An original specification entry before an attack method creates variations.

Project

The main container for assets and results.

Project API key

A secret credential that grants API access to all or selected evals in one project.

Red Team

Evaluation work focused on misuse, harmfulness, jailbreaks, and failure-seeking behavior.

Resource limit

The maximum count for one plan-governed organization resource.

Result

A recorded output, score, or status produced during a run.

Robust accuracy

The percentage of evaluated primary entries with a correct primary result and no scored adversarial failure.

Run

One product execution that produces a result.

Saved specification version

An immutable saved state of a specification that an eval can pin.

Scored result

A result that completed execution and judging without an error.

Secret

A protected project value used by an agent at runtime.

Specification

A reusable evaluation recipe containing entries, attack methods, and scoring configuration.

Specification run

One execution of a pinned specification version against one agent within an eval run.

Agent​

Attack method​

Clean accuracy​

Credit​

Credit usage​

Entry​

Entry category​

Eval​

Eval run​

Gold Team​

Judge​

Multi-turn eval​

Organization​

Pinned specification version​

Plan​

Plan limit​

Playground​

Primary entry​

Project​

Project API key​

Red Team​

Resource limit​

Result​

Robust accuracy​

Run​

Saved specification version​

Scored result​

Secret​

Specification​

Specification run​