Agent red teaming - Grand Central Documentation

Red teaming is a systematic approach for finding vulnerabilities in AI agents before deployment. The starter kits include built-in support for adversarial testing using Promptfoo, enabling you to test for prompt injections, jailbreaks, harmful content generation, and other security issues.

All starter kits include red teaming which runs automatically as part of the CI/CD pipelines.

Overview

Red team testing helps identify security vulnerabilities by simulating adversarial attacks against your agents. The implementation tests across multiple categories:

Prompt injections: Attempts to manipulate agent behavior through crafted inputs
Jailbreaks: Techniques to bypass safety measures and restrictions
Harmful content: Testing for hate speech, violence, and other harmful outputs
PII leakage: Attempts to extract personally identifiable information
Prompt extraction: Attempts to reveal system prompts and internal instructions

File structure

starter-agent/
├── redteam/
│   ├── configs/                        # Promptfoo source configurations
│   │   ├── <agent_name>_promptfooconfig.yaml
│   └── tests/                          # Generated adversarial test cases
│       ├── <agent_name>_redteam.yaml

Running red team tests

Prerequisites

Node.js 20+: Required for Promptfoo CLI
Running agent server: Your agent must be accessible at its endpoint

Step 1: Start your agent

source .env && uv run python -m src.main

Step 2: Run tests

Run the red team evaluation for any of the available agents:

npx promptfoo@latest redteam eval \
  -c redteam/tests/<agent_name>_redteam.yaml

Step 3: View the report

After running tests, launch the interactive report viewer:

npx promptfoo@latest redteam report

The report provides:

Vulnerability categories: Types of issues found (injections, harmful content, etc.)
Severity levels: Classification by potential impact
Detailed logs: Specific inputs that triggered vulnerabilities
Suggested mitigations: Recommendations for addressing issues

Configuration

The configuration file defines the target agent endpoint and red team settings. The following example shows the configuration:

description: Instructions Agent Red Team Configuration

targets:
  - id: http
    label: instructions-agent
    config:
      url: http://localhost:8000/run/instructions_agent
      method: POST
      headers:
        Content-Type: application/json
      body: '{"query":"{{prompt}}"}'
      transformResponse: json.output || json.detail

prompts:
  - '{{prompt}}'

redteam:
  purpose: |-
    An AI assistant that follows user instructions precisely 
    and provides concise, relevant responses.
  
  plugins:
    - id: harmful:hate
      numTests: 5
    - id: harmful:insults
      numTests: 5
    - id: pii:direct
      numTests: 5
    - id: prompt-extraction
      numTests: 5
    - id: hijacking
      numTests: 5
    
  strategies:
    - id: basic
    - id: jailbreak:meta
  
  numTests: 5

For details about available plugins, attack strategies, and advanced configuration options, see the Promptfoo Red Team documentation.

Generating new test cases

To generate fresh adversarial test cases from your configuration:

Interactive setup

# Web UI setup
npx promptfoo@latest redteam setup

# CLI-only setup
npx promptfoo@latest redteam init --no-gui

Generate from existing config

npx promptfoo@latest redteam generate \
  -c redteam/configs/<agent_name>_promptfooconfig.yaml \
  -o redteam/tests/<agent_name>_redteam.yaml

Best practices

Use guardrails

Combine red teaming with Guardrails for defense-in-depth

Update test cases

Regenerate test cases periodically to cover new attack vectors

Next steps

Guardrails: Add runtime safety checks to your agents
Evaluations: Set up comprehensive agent evaluation
CI/CD Workflows: Configure automated testing pipelines

​Overview

​File structure

​Running red team tests

​Prerequisites

​Step 1: Start your agent

​Step 2: Run tests

​Step 3: View the report

​Configuration

​Generating new test cases

​Interactive setup

​Generate from existing config

​Best practices

Use guardrails

Update test cases

​Next steps

Overview

File structure

Running red team tests

Prerequisites

Step 1: Start your agent

Step 2: Run tests

Step 3: View the report

Configuration

Generating new test cases

Interactive setup

Generate from existing config

Best practices

Next steps