Skip to main content

Onboarding Example

Use this walkthrough when you want to understand how Spec27 works without creating assets from scratch.

All organizations include a preloaded onboarding project with the assets and specifications you need for a guided product tour.

Before You Begin

  • You can sign in to Spec27.
  • You belong to an organization.

Open the Onboarding Project

  1. Open Projects.
  2. Select ParcelShip (Onboarding Example).

After the project opens, use the left navigation menu to move between the resources in the project.

This onboarding project already contains:

  • datasets
  • an agent connection
  • specifications
  • evals
  • run history and results

Understand the Example Scenario

The onboarding example is based on ParcelShip, a dummy chatbot that helps users collect parcels from a storage locker.

The agent checks the user's credentials, the status of the parcel, and provides the necessary pickup instructions

This makes the project useful for both gold-team and red-team evaluation:

  • gold-team work checks whether the agent helps valid users correctly
  • red-team work checks whether the agent resists unauthorized requests

Review the Datasets

From the left navigation bar, open Datasets. This contains two datasets, Security Queries, and User Queries.

These two datasets serve different purposes:

  • Security Queries is the red-team dataset. It contains unauthorized requests, such as asking for the locker code without completing phone number authentication.
  • User Queries is the gold-team dataset. It contains normal user requests and the expected agent responses.

Review Adversarial Datasets

In each dataset, select the chevron to expand the list of related adversarial datasets.

These adversarial datasets contain automatically generated variations of entries from the primary datasets. They help you test whether the agent stays correct when the wording or structure of the prompt changes.

Review the Agent

  1. In the left navigation, open Agents.
  2. Open ParcelShipBot.
  3. Review the JavaScript code shown on the page.
  4. Open Preview if you want to test the agent manually.

This agent connects to the ParcelShip service through a REST API. The preview flow gives you a quick way to inspect how the agent behaves before you review the saved specifications and evals.

Review the Specifications

  1. In the left navigation, open Specifications.
  2. Review the specifications created for the onboarding project.

As you review them, notice that the project includes specifications for both:

  • red-team evaluation
  • gold-team evaluation

Each specification combines:

  • one primary dataset
  • automatically generated adversarial datasets
  • an evaluation method (LLM judge)

A specification defines what should be tested and how it should be evaluated. It is the reusable evaluation recipe, not the run itself.

Review the Evals

  1. In the left navigation, open Evals.
  2. Open Security to review the red-team eval.
  3. Open Robustness to review the gold-team eval.

Each eval connects saved assets into a runnable workflow. In this onboarding project, the evals show how the same agent can be evaluated for different goals using different specifications.

Review Results and Run History

  1. In either eval, select View results.
  2. Review the performance summary and the run history.

The results view helps you compare:

  • clean performance
  • robust performance
  • previous runs over time

Clean performance is the standard accuracy on the primary dataset.

Robust performance is the percentage of primary examples that remained correct across all of their adversarial variants.

Review Detailed Run Results

  1. In the results view, open a run from the run history.
  2. Select View details.
  3. Review the run summary.
  4. Review the case-by-case outputs.

Use the detailed run view to inspect:

  • overall robustness
  • individual outputs
  • which cases passed or failed
  • how the agent handled adversarial variants