Attack Methods
This page is the current attack-method catalog for Spec27.
Use it when you want to understand:
- which attacks are available
- which team flow they fit
Gold Team methods
| Name | Slug | What it does |
|---|---|---|
| Adjacent key | adjacent_key | Randomly swap keyboard characters with a neighbor key. |
| Adjacent key (heavy) | adjacent_key_heavy | Combinatorial adjacent-key typos with optional semantic filtering. |
| ESL | esl | Approximate ESL-like inflectional errors using spaCy POS tags. |
| ESL (LLM) | esl_llm | Rewrite prompts with ESL-like errors via an LLM. |
| Homoglyph | homoglyph | Replace ASCII characters with visually similar Unicode homoglyphs. |
| Caesar Cipher | caesar_cipher | Encode the request through a reversible Caesar shift with an in-band key sentence. |
| In-band key (word substitution) | in_band_key_word_substitution | Use a reversible word-substitution rule inside the prompt. |
| Lexical substitution | lexical_substitution | Replace key words with close synonyms via an LLM. |
| Paraphrase | paraphrase | Paraphrase prompts via an LLM with optional semantic filtering. |
| Persona Broad | persona_broad | Rewrite prompts as if authored by a sampled persona. |
| Persona pro | persona_pro | Rewrite prompts as if authored by a sampled professional persona. |
| QA confusion | qa_confusion | Add irrelevant domain-specific context to confuse QA systems. |
| Reversal | reversal | Generate semantically similar prompts that elicit plausible incorrect answers. |
| Sentence structure | sentence_structure | Rewrite sentence structure via sampled LLM strategies. |
| Special character | special_character | Perturb important spans with special characters and optional semantic checks. |
| Texting | texting | Rewrite prompts in UK SMS or WhatsApp style. |
Red Team methods
| Name | Slug | What it does |
|---|---|---|
| Dynamic grandma | dynamic_grandma | Generate grandma-style jailbreak prompts via an LLM. |
| Fictional roleplay | fictional_roleplay | Generate creative-writing roleplay jailbreak prompts via an LLM. |
| GPT fuzzer | gpt_fuzzer | Mutate jailbreak templates and synthesize them with forbidden actions. |
| Grandma | grandma | Roleplay as a grandmother to elicit restricted content. |
| Legal Compliance | legal_compliance | Generate obfuscated rewrites by sampling strategies and applying them via an LLM. |
| Poetic jailbreak | poetic_jailbreak | Translate forbidden requests into structured procedural poems via an LLM. |
| Privileged mode | privileged_mode | Generate system or developer roleplay jailbreak prompts via an LLM. |
| Research bypass | research_bypass | Generate academic or historical framing jailbreak prompts via an LLM. |
| Sensitive info code wrapper | sensitive_info_code_wrapper | Ask for sensitive information in code or schema-shaped output formats. |
| Sensitive info cognitive overload | sensitive_info_cognitive_overload | Bury an exfiltration request inside a long multi-step prompt. |
| Sensitive info roleplay | sensitive_info_roleplay | Use persona or roleplay setups that make disclosure feel appropriate. |
| AutoDAN zero-shot | zero_shot_autodan | Use developer-mode jailbreak templates filtered through a surrogate model. |
Practical guidance
- Choose methods based on whether you are testing robustness against normal variation or resistance to harmful or misuse-oriented prompts.
- In red-team work, the built-in suites are often easier to start with than selecting every method manually.