All starter kits include red teaming which runs automatically as part of the CI/CD pipelines.
Overview
Red team testing helps identify security vulnerabilities by simulating adversarial attacks against your agents. The implementation tests across multiple categories:- Prompt injections: Attempts to manipulate agent behavior through crafted inputs
- Jailbreaks: Techniques to bypass safety measures and restrictions
- Harmful content: Testing for hate speech, violence, and other harmful outputs
- PII leakage: Attempts to extract personally identifiable information
- Prompt extraction: Attempts to reveal system prompts and internal instructions
File structure
Running red team tests
Prerequisites
- Node.js 20+: Required for Promptfoo CLI
- Running agent server: Your agent must be accessible at its endpoint
Step 1: Start your agent
Step 2: Run tests
Run the red team evaluation for any of the available agents:Step 3: View the report
After running tests, launch the interactive report viewer:- Vulnerability categories: Types of issues found (injections, harmful content, etc.)
- Severity levels: Classification by potential impact
- Detailed logs: Specific inputs that triggered vulnerabilities
- Suggested mitigations: Recommendations for addressing issues
Configuration
The configuration file defines the target agent endpoint and red team settings. The following example shows the configuration:For details about available plugins, attack strategies, and advanced configuration options, see the Promptfoo Red Team documentation.
Generating new test cases
To generate fresh adversarial test cases from your configuration:Interactive setup
Generate from existing config
Best practices
Use guardrails
Combine red teaming with Guardrails for defense-in-depth
Update test cases
Regenerate test cases periodically to cover new attack vectors
Next steps
- Guardrails: Add runtime safety checks to your agents
- Evaluations: Set up comprehensive agent evaluation
- CI/CD Workflows: Configure automated testing pipelines