Skip to content

[submit] Example: Add reasoning to attacker, two-pass defense to defender#34

Open
sofasogood wants to merge 1 commit intomainfrom
example/attacker-with-reasoning
Open

[submit] Example: Add reasoning to attacker, two-pass defense to defender#34
sofasogood wants to merge 1 commit intomainfrom
example/attacker-with-reasoning

Conversation

@sofasogood
Copy link
Collaborator

What this PR shows

This is an example submission so participants can see exactly what files to change and what a Phase 2 submission looks like.

Attacker change (agents/attacker/agent.py)

  • Added a reasoning step before generating the attack
  • The attacker first analyzes the scenario, defender profile, and previous failed attempts to decide on a strategy
  • Then generates the actual attack informed by that reasoning
  • Uses 2 of the 4 available LLM requests per round

Defender change (agents/defender/agent.py)

  • Added a two-pass defense
  • First pass: analyze the input for manipulation attempts (prompt injection, social engineering, hidden instructions)
  • Second pass: respond to the input with the security analysis in mind
  • Uses 2 of the 4 available LLM requests per round

What to notice

  • Only files inside agents/attacker/ and agents/defender/ were modified
  • No changes to the orchestrator, plugins, or any other framework files
  • The commit message starts with [submit] which triggers the submission workflow
  • You can also use [submit-attacker] or [submit-defender] to submit just one agent

Attacker: before generating an attack, reason about what strategy to
use based on the scenario and previous failed attempts (2 of 4 LLM
requests).

Defender: before responding, analyze the input for manipulation
attempts (prompt injection, social engineering, hidden instructions),
then respond with that analysis in mind (2 of 4 LLM requests).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant