Automated Test Generation
Spec27 helps teams start with an initial test set, apply adversarial methods, and grow that baseline into broader coverage without rebuilding the workflow manually.
That matters because a small seed dataset is useful for getting started, but it rarely exposes the full range of robustness and failure modes you need to understand before release.
Start with an initial test set
Most teams begin with a primary dataset of representative examples. That initial set becomes the starting point for the rest of the workflow.
From there, Spec27 lets you apply adversarial methods to generate additional pressure and broaden the test surface:
- keep the original dataset as the baseline reference
- run attack methods against that baseline to create extra test cases
- add adversarial coverage that probes robustness under variation and pressure
- reuse the same evaluation setup as the test set grows
This keeps the workflow grounded in the original task while still increasing the breadth of what gets tested.
Use adversarial methods to add extra cases
The goal of automated test generation is not only to create more rows. It is to introduce the kinds of variations and adversarial pressure that reveal whether the agent remains reliable.
Depending on the method, Spec27 can take a known-good or representative test set and expand it with new cases that are harder, more varied, or more failure-seeking than the original inputs.
That helps teams move from "we tested the examples we thought of first" to "we tested the baseline plus the adversarial cases generated from it."
What this looks like in practice
In practice, the flow looks like this:
- create or import an initial test set
- define the specification that uses that dataset
- apply adversarial methods to generate additional cases
- run the expanded evaluation and review the results
This makes the growth in coverage easier to inspect. You still know what the original dataset was, but you also get the broader test set produced by the methods you selected.
Why this is useful
Automated test generation is useful when you want to:
- get beyond a narrow hand-built seed set
- test how robust an agent is under harder or more adversarial inputs
- increase coverage without reauthoring the full workflow by hand
- keep the generated cases tied to the evaluation setup that produced them
Example: expanding a baseline with adversarial coverage
Suppose you start with a dataset of support-policy questions that represent the core behaviors you expect from an agent.
Once that baseline exists, you can apply adversarial methods to produce extra cases that:
- rephrase or distort the original requests
- add distracting or pressure-inducing context
- probe refusal boundaries or unsafe compliance
- reveal cases that the seed set did not cover on its own
That gives the team a broader and more realistic test surface while keeping the baseline and generated cases inside one evaluation workflow.
Future addition: walkthrough video
This page is also a natural place for a future walkthrough video that shows an initial test set being expanded by adversarial methods inside the product.