Datasets
Use Datasets to define the inputs and expected behavior your evaluations will use.
Before You Begin
- You have a project.
What a Dataset Contains
For the current user-facing workflow, dataset entries typically contain:
input_textexpected_output- optional
category
Use category when you want to group cases into meaningful slices for later analysis.
Dataset Types
- A Primary dataset is the starting point for evaluation.
- An Adversarial dataset is derived from a primary dataset and linked back to it.
- Parent datasets can show their related adversarial datasets in the detail view.
Create a Primary Dataset
- Open Datasets inside a project.
- Create a primary dataset for the workflow you want to evaluate.
- Give the dataset a clear name that reflects the workflow or behavior under test.
- Add entries manually, or use CSV import if you already have source data.
- Review the saved dataset on the detail page.
Add and Review Dataset Entries
Review each entry to confirm that the fields match the evaluation method you plan to use:
input_textcontains the user input or test promptexpected_outputreflects the intended answer or acceptable targetcategoryis used consistently when you want grouped analysis later
Use categories for slices such as product area, failure mode, or scenario type. Consistent categories make the results views easier to interpret.
Import and Export CSV Data
- Use CSV import when you already have a case list outside Spec27.
- Map columns for input text, expected output, and optional category.
- Export CSV when you want to inspect the current state of the dataset outside the app.
CSV import is useful when you are onboarding an existing case set. CSV export is useful when you want to share the current dataset with teammates or review it outside Spec27.
Create Adversarial or Derivative Datasets
Create an adversarial or derivative dataset when you want to preserve the original primary dataset and explore transformed variants separately.
This is useful when you want to:
- keep a stable baseline dataset
- compare baseline and transformed behavior
- organize generated or attack-based cases without mixing them into the original source set
Best Practices
- Keep one dataset focused on one test surface.
- Use categories consistently if you want meaningful result slices later.
- Prefer creating an adversarial dataset over editing the primary dataset when you want to preserve a baseline.
- Expect some delete actions to be blocked when related work depends on the dataset.