Skip to main content
Red teaming is a systematic approach for finding vulnerabilities in AI agents before deployment. The starter kits include built-in support for adversarial testing using Promptfoo, enabling you to test for prompt injections, jailbreaks, harmful content generation, and other security issues.
All starter kits include red teaming which runs automatically as part of the CI/CD pipelines.

Overview

Red team testing helps identify security vulnerabilities by simulating adversarial attacks against your agents. The implementation tests across multiple categories:
  • Prompt injections: Attempts to manipulate agent behavior through crafted inputs
  • Jailbreaks: Techniques to bypass safety measures and restrictions
  • Harmful content: Testing for hate speech, violence, and other harmful outputs
  • PII leakage: Attempts to extract personally identifiable information
  • Prompt extraction: Attempts to reveal system prompts and internal instructions

File structure

starter-agent/
├── redteam/
│   ├── configs/                        # Promptfoo source configurations
│   │   ├── <agent_name>_promptfooconfig.yaml
│   └── tests/                          # Generated adversarial test cases
│       ├── <agent_name>_redteam.yaml

Running red team tests

Prerequisites

  • Node.js 20+: Required for Promptfoo CLI
  • Running agent server: Your agent must be accessible at its endpoint

Step 1: Start your agent

source .env && uv run python -m src.main

Step 2: Run tests

Run the red team evaluation for any of the available agents:
npx promptfoo@latest redteam eval \
  -c redteam/tests/<agent_name>_redteam.yaml

Step 3: View the report

After running tests, launch the interactive report viewer:
npx promptfoo@latest redteam report
The report provides:
  • Vulnerability categories: Types of issues found (injections, harmful content, etc.)
  • Severity levels: Classification by potential impact
  • Detailed logs: Specific inputs that triggered vulnerabilities
  • Suggested mitigations: Recommendations for addressing issues

Configuration

The configuration file defines the target agent endpoint and red team settings. The following example shows the configuration:
description: Instructions Agent Red Team Configuration

targets:
  - id: http
    label: instructions-agent
    config:
      url: http://localhost:8000/run/instructions_agent
      method: POST
      headers:
        Content-Type: application/json
      body: '{"query":"{{prompt}}"}'
      transformResponse: json.output || json.detail

prompts:
  - '{{prompt}}'

redteam:
  purpose: |-
    An AI assistant that follows user instructions precisely 
    and provides concise, relevant responses.
  
  plugins:
    - id: harmful:hate
      numTests: 5
    - id: harmful:insults
      numTests: 5
    - id: pii:direct
      numTests: 5
    - id: prompt-extraction
      numTests: 5
    - id: hijacking
      numTests: 5
    
  strategies:
    - id: basic
    - id: jailbreak:meta
  
  numTests: 5
For details about available plugins, attack strategies, and advanced configuration options, see the Promptfoo Red Team documentation.

Generating new test cases

To generate fresh adversarial test cases from your configuration:

Interactive setup

# Web UI setup
npx promptfoo@latest redteam setup

# CLI-only setup
npx promptfoo@latest redteam init --no-gui

Generate from existing config

npx promptfoo@latest redteam generate \
  -c redteam/configs/<agent_name>_promptfooconfig.yaml \
  -o redteam/tests/<agent_name>_redteam.yaml

Best practices

Use guardrails

Combine red teaming with Guardrails for defense-in-depth

Update test cases

Regenerate test cases periodically to cover new attack vectors

Next steps