Skip to main content

Agents

Use Agents to store the runnable logic you want Spec27 to evaluate.

An agent receives the input from a dataset entry, sends that input to your service or model, and returns the final output that Spec27 evaluates.

Before You Begin

  • You have a project.
  • You know what inputs the agent is expected to handle.

Create an Agent

  1. Open Agents inside a project.
  2. Create a new agent and give it a descriptive title.
  3. Add or edit the agent content.
  4. Save the agent.
  5. If needed, configure any required rate-limit settings.
  6. Open the agent detail page.

Add Agent Code

Agent content is written in JavaScript as a small REST client, following the pattern:

return async function process(input) {
const response = await fetch("https://your-service.example.com/respond", {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({ input }),
});

const data = await response.json();
return data.output;
}

This keeps the agent focused on one job:

  • receive the dataset input
  • send that input to your REST endpoint
  • read the response
  • return the final output that Spec27 should evaluate

Before you save the agent, confirm that:

  • the request URL is correct
  • the request method matches your API
  • the request body matches your API contract
  • the returned value is the final output you want Spec27 to score

Review the Agent

After the agent is created, open the agent detail page and review:

  • the current code content
  • required secrets
  • whether any required secrets are missing
  • rate-limit settings
  • eval usage

If you want to test the agent before wiring it into an eval, open Preview from the agent page.

Test the Agent with Preview

Use the agent run page for a quick preview before attaching the agent to an eval.

To test the agent:

  1. Open the saved agent.
  2. Select Preview.
  3. Enter a sample input.
  4. Run the preview.
  5. Review the output.
  6. Review the console output.

Preview is the fastest way to validate that:

  • the agent accepts the expected input shape
  • required secrets are available
  • the output and logs look reasonable
  • the agent is ready for a more formal run

Preview does not create a persisted eval run. It is meant for quick validation while you are still building or debugging the agent.

Check Preview Results

When you run a preview, review:

  • whether the output matches the expected format
  • whether the returned value is complete and usable for scoring
  • whether console logs show any runtime issues
  • whether the request to your API succeeded

If the preview fails, the page can show the error type, message, and stack trace. Use that information to decide whether the issue is in the agent code, the API contract, or the upstream service.

Use the Agent in Evals

After preview works as expected, add the agent to one or more evals.

The agent detail page also shows:

  • which evals already use the agent
  • a link to open those evals
  • a results view for the agent across related runs

This makes the agent page the main place to move between implementation, testing, and reuse.

Important Notes

  • Agent preview is blocked when required secrets are missing.
  • Agent runs can capture output, errors, and console logs.
  • Agents can be reused across multiple evals.
  • Preview is the fastest way to validate that the agent works before you commit it to a broader eval workflow.