AI Governance Institute logo
AI Governance Institute

Practical Governance for Enterprise AI

Safety & Reliability
SAF · Safety & ReliabilitySAF-006High effortAgent-relevant

Post-Deployment Adversarial Testing Cadence

Schedule and execute recurring adversarial testing of production AI systems on a risk-tiered cadence, separate from and in addition to pre-deployment red-teaming.

Objective

Detect new attack surfaces, capability changes, and safety regressions that emerge after deployment — including those introduced by model updates, prompt changes, new user populations, and novel attack techniques discovered after go-live.

Maturity Levels

1

Initial

Adversarial testing occurs only pre-deployment; production systems are not retested.

2

Developing

Ad hoc adversarial testing occurs when incidents prompt concern; no scheduled cadence exists.

3

Defined

A post-deployment adversarial testing schedule is defined, tiered by system risk level; test plans are documented and results are recorded.

4

Managed

Test results are compared against prior runs to track safety trajectory; regressions trigger remediation before the next release cycle.

5

Optimizing

Adversarial test suites update continuously with newly published attack techniques; automated regression tests run on every model update in production.

Evidence Requirements

What an auditor or assessor would expect to see for this control.

  • Post-deployment adversarial testing schedule with risk-tiered cadence documentation
  • Test plans for each production system showing attack categories covered
  • Executed test reports with findings, severity ratings, and remediation records
  • Comparison reports showing safety trajectory across test cycles
  • Evidence that test suites are updated with newly published attack techniques

Implementation Notes

Key steps

  • Establish a testing cadence tied to risk tier: Critical systems tested quarterly, Significant semi-annually, Limited annually at minimum.
  • Separate post-deployment test suites from pre-deployment suites — post-deployment testing should include attack patterns that have emerged since initial deployment.
  • Include in each test: prompt injection, jailbreak attempts using techniques published since last test, data exfiltration attempts, and output manipulation targeting your specific use case.
  • For high-risk sectors (healthcare, financial services), include sector-specific adversarial scenarios relevant to your regulatory environment.
  • Treat adversarial test reports as board-reportable for Critical systems — if a new attack pattern succeeds against a production system, that is a governance event.

Example Implementation

Customer-facing AI assistant for a financial services firm

Post-Deployment Adversarial Test Report — Q2 2026

System: Customer Service AI Assistant (Critical tier) Test date: 2026-05-28 | Prior test: 2026-02-19

New attack techniques tested since last cycle:

  • Many-shot jailbreaking (published Feb 2026) — FAILED (blocked by guardrails)
  • Crescendo multi-turn escalation — PARTIAL PASS (escalation blocked at turn 4, not turn 2 as expected)
  • Financial advice boundary probing (sector-specific) — PASSED

Regressions vs. prior test: None identified. New finding: Crescendo resistance degraded. Guardrail threshold requires tuning. Remediation: Guardrail update deployed 2026-05-30. Retest scheduled 2026-06-05.

Control Details

Control ID
SAF-006
Typical owner
Security / AI Red Team
Implementation effort
High effort
Agent-relevant
Yes

Tags

red teamingadversarial testingpost-deploymentsafetyregression testing