AI Governance Institute logo
AI Governance Institute

Practical Governance for Enterprise AI

Security
SEC · SecuritySEC-005High effortAgent-relevant

Adversarial Robustness Testing

Systematically test AI systems against adversarial inputs, edge cases, and known attack techniques before deployment and on a recurring basis.

Objective

Identify vulnerabilities in AI system behavior under adversarial conditions before they are exploited in production.

Maturity Levels

1

Initial

No adversarial testing is performed; systems are tested only for functional correctness.

2

Developing

Ad hoc adversarial testing is performed by individual engineers without a defined scope or methodology.

3

Defined

A documented adversarial testing program is conducted before each major deployment, covering prompt injection, data poisoning, and output manipulation.

4

Managed

Testing results are tracked over time; findings are remediated with defined SLAs; testing scope expands as new attack techniques emerge.

5

Optimizing

Automated adversarial probing runs continuously in staging; novel attack techniques are sourced from threat intelligence.

Evidence Requirements

What an auditor or assessor would expect to see for this control.

  • Structured test plan including threat model, test categories, pass criteria, and tester role assignments
  • Test results report with findings, severity ratings, and current remediation status for each finding
  • Remediation tracking records confirming critical and high findings were resolved within defined SLAs
  • Retest results confirming all findings were addressed before the system was promoted to production
  • Testing cadence records confirming tests were conducted on schedule and triggered by model or prompt changes

Implementation Notes

Key steps

  • Distinguish adversarial robustness testing from functional QA — the goal is to break the system, not confirm it works normally.
  • Cover at minimum: prompt injection, jailbreaks, data extraction attempts, role confusion, and boundary violation tests.
  • Use structured red-team methodology: assign a dedicated team, define the threat model, document findings formally, and track remediation.
  • Retest after every significant model update or prompt change — robustness properties do not transfer automatically between versions.

Example Implementation

Financial institution testing an AI model for credit application decisions before production deployment

Adversarial Robustness Test Plan — Credit Decision Model v3.1

Test categories and pass criteria:

CategoryTest CasesPass Criterion
Prompt injection40 cases — instruction override attempts in free-text fields0 instructions followed
Jailbreak25 cases — role-play and persona switching attempts0 policy violations
Boundary probing30 cases — inputs at edge of training distributionGraceful degradation, no confidence inflation
Data extraction20 cases — attempts to elicit training data or other applicants' information0 data leaks
Adversarial feature manipulation50 cases — systematically modified inputs designed to flip decision< 5% unexpected flips

Methodology: Dedicated red team (2 engineers, 1 external consultant); findings documented in structured report with severity ratings

Remediation SLAs: Critical findings block deployment; High findings require remediation plan before deployment; Medium tracked in backlog

Retest requirement: Any model update or prompt change triggers re-run of full test suite before promotion to production

Control Details

Control ID
SEC-005
Domain
Security
Typical owner
AI Security / Red Team
Implementation effort
High effort
Agent-relevant
Yes

Tags

red teamingadversarial testingrobustnesssecurity testing