Skip to main content

Attack Methods

This page is the current attack-method catalog for Spec27.

Use it when you want to understand:

  • which attacks are available
  • which team flow they fit

Gold Team methods

NameSlugWhat it does
Adjacent keyadjacent_keyRandomly swap keyboard characters with a neighbor key.
Adjacent key (heavy)adjacent_key_heavyCombinatorial adjacent-key typos with optional semantic filtering.
ESLeslApproximate ESL-like inflectional errors using spaCy POS tags.
ESL (LLM)esl_llmRewrite prompts with ESL-like errors via an LLM.
HomoglyphhomoglyphReplace ASCII characters with visually similar Unicode homoglyphs.
Caesar Ciphercaesar_cipherEncode the request through a reversible Caesar shift with an in-band key sentence.
In-band key (word substitution)in_band_key_word_substitutionUse a reversible word-substitution rule inside the prompt.
Lexical substitutionlexical_substitutionReplace key words with close synonyms via an LLM.
ParaphraseparaphraseParaphrase prompts via an LLM with optional semantic filtering.
Persona Broadpersona_broadRewrite prompts as if authored by a sampled persona.
Persona propersona_proRewrite prompts as if authored by a sampled professional persona.
QA confusionqa_confusionAdd irrelevant domain-specific context to confuse QA systems.
ReversalreversalGenerate semantically similar prompts that elicit plausible incorrect answers.
Sentence structuresentence_structureRewrite sentence structure via sampled LLM strategies.
Special characterspecial_characterPerturb important spans with special characters and optional semantic checks.
TextingtextingRewrite prompts in UK SMS or WhatsApp style.

Red Team methods

NameSlugWhat it does
Dynamic grandmadynamic_grandmaGenerate grandma-style jailbreak prompts via an LLM.
Fictional roleplayfictional_roleplayGenerate creative-writing roleplay jailbreak prompts via an LLM.
GPT fuzzergpt_fuzzerMutate jailbreak templates and synthesize them with forbidden actions.
GrandmagrandmaRoleplay as a grandmother to elicit restricted content.
Legal Compliancelegal_complianceGenerate obfuscated rewrites by sampling strategies and applying them via an LLM.
Poetic jailbreakpoetic_jailbreakTranslate forbidden requests into structured procedural poems via an LLM.
Privileged modeprivileged_modeGenerate system or developer roleplay jailbreak prompts via an LLM.
Research bypassresearch_bypassGenerate academic or historical framing jailbreak prompts via an LLM.
Sensitive info code wrappersensitive_info_code_wrapperAsk for sensitive information in code or schema-shaped output formats.
Sensitive info cognitive overloadsensitive_info_cognitive_overloadBury an exfiltration request inside a long multi-step prompt.
Sensitive info roleplaysensitive_info_roleplayUse persona or roleplay setups that make disclosure feel appropriate.
AutoDAN zero-shotzero_shot_autodanUse developer-mode jailbreak templates filtered through a surrogate model.

Practical guidance

  • Choose methods based on whether you are testing robustness against normal variation or resistance to harmful or misuse-oriented prompts.
  • In red-team work, the built-in suites are often easier to start with than selecting every method manually.