Onboarding Example

Use this walkthrough when you want to understand how Spec27 works without creating assets from scratch.

All organizations include a preloaded onboarding project with the assets and specifications you need for a guided product tour.

Before You Begin

You can sign in to Spec27.
You belong to an organization.

Open the Onboarding Project

Open Projects.
Select ParcelShip (Onboarding Example).

After the project opens, use the left navigation menu to move between the resources in the project.

This onboarding project already contains:

an agent connection
specifications
evals
run history and results

Understand the Example Scenario

The onboarding example is based on ParcelShip, a dummy chatbot that helps users collect parcels from a storage locker.

The agent checks the user's credentials, the status of the parcel, and provides the necessary pickup instructions

This makes the project useful for both gold-team and red-team evaluation:

gold-team work checks whether the agent helps valid users correctly
red-team work checks whether the agent resists unauthorized requests

Review the Specifications

In the left navigation, open Specifications.
Open one of the specifications to review the entries and evaluation method.
In the specification editor, click the Entries tab to see all entries.

Each specification includes:

Primary entries: Examples of the behavior you want to test (e.g., valid user requests or unauthorized requests)
Adversarial variants: Automatically generated variations of primary entries, grouped under their parent. These help you test whether the agent stays correct when the wording or structure changes.
Schema: The contract for expected outputs
Knowledge: Context the agent should know to evaluate correctly
Evaluation method: How to assess correctness (e.g., LLM judge)

As you review them, notice that the project includes specifications for both red-team evaluation (security) and gold-team evaluation (normal user behavior).

Review the Agent

In the left navigation, open Agents.
Open ParcelShipBot.
Review the JavaScript code shown on the page.
Open Preview if you want to test the agent manually.

This agent connects to the ParcelShip service through a REST API. The preview flow gives you a quick way to inspect how the agent behaves before you review the saved specifications and evals.

Review the Evals

In the left navigation, open Evals.
Open Security to review the red-team eval.
Open Robustness to review the gold-team eval.

Each eval connects saved assets into a runnable workflow. In this onboarding project, the evals show how the same agent can be evaluated for different goals using different specifications.

Review Results and Run History

In either eval, select View results.
Review the performance summary and the run history.

The results view helps you compare:

clean performance
robust performance
previous runs over time

Clean performance is the standard accuracy on the primary entries.

Robust performance is the percentage of primary examples that remained correct across all of their adversarial variants.

Review Detailed Run Results

In the results view, open a run from the run history.
Select View details.
Review the run summary.
Review the case-by-case outputs.

Use the detailed run view to inspect:

overall robustness
individual outputs
which cases passed or failed
how the agent handled adversarial variants

Before You Begin​

Open the Onboarding Project​

Understand the Example Scenario​

Review the Specifications​

Review the Agent​

Review the Evals​

Review Results and Run History​

Review Detailed Run Results​

Related Pages​