AI Governance Institute logo
AI Governance Institute

AI governance intelligence, tracked daily

← AI Governance Playbook

Question 33 of 34

What do we do when an AI system causes harm or fails?

A structured incident response process for AI failures — from initial detection through containment, root cause investigation, regulatory notification, and prevention.

If you only do 3 things, do this:

  1. 1.Define what counts as an AI incident before one happens. If you're deciding whether something is an incident after it occurs, you've already lost time you needed for containment.
  2. 2.Containment is the first question: can the system be paused, the output recalled, or the decision reversed? These questions need pre-assigned answers. You don't have time to deliberate when something is actively going wrong.
  3. 3.Document the post-incident report even for near-misses. Regulators view documented incidents plus corrective action as evidence of a functioning governance program.

The Situation

Who this is for: Risk, compliance, legal, and technology teams responsible for AI operations and incident response

When you need this: When an AI system produces harmful, discriminatory, inaccurate, or unexpected outputs — or when a near-miss is detected before harm occurs

The Decision

Is this incident within acceptable parameters, and if not, what is our response — immediate containment, investigation, regulatory notification, and remediation?

The Steps

  1. 1Detect and triage: categorize the incident (bias/discrimination, accuracy failure, safety issue, privacy breach, regulatory violation, reputational harm) and assign a severity level
  2. 2Contain: pause the system, recall outputs, or reverse affected decisions as appropriate for the incident type and severity
  3. 3Notify internally: escalate to the appropriate level based on severity; loop in legal for any incident with regulatory or litigation implications
  4. 4Assess regulatory notification obligations: which jurisdictions' rules require notification, to whom, and within what timeframe?
  5. 5Investigate: identify root cause (model failure, data issue, process failure, misuse, or unexpected input distribution)
  6. 6Remediate: fix the root cause, not just the symptom; re-test before restoring full operation
  7. 7Document: produce a post-incident report covering timeline, cause, impact, regulatory notifications made, and prevention measures implemented

The Artifacts

  • AI incident classification matrix (type × severity × response path)
  • AI incident response runbook (step-by-step for each incident type)
  • Regulatory notification obligation checklist (who must be told, when, in which jurisdictions)
  • Post-incident report template (timeline, root cause, impact, notifications, remediation, prevention)
  • AI incident register (log of all incidents and near-misses)

The Output

A documented incident response process with pre-assigned containment and escalation paths, a post-incident report for every material incident, regulatory notifications made on time, and controls updated to prevent recurrence.

Defining and classifying AI incidents

An AI incident is any event where an AI system produces outputs or takes actions that cause or could cause harm, violate applicable regulations, or materially deviate from intended behavior. This includes obvious failures (a model that consistently produces discriminatory outcomes) and subtle ones (a model whose accuracy has degraded below its approved threshold, or a generative AI system that produced a materially false output used in an external communication).

Incident classification drives response. Bias or discrimination incidents require immediate containment and legal review. Accuracy failures require risk assessment of affected decisions. Privacy breaches may trigger data protection notification obligations with specific deadlines. Safety incidents involving physical harm or its risk are the most urgent. Classify on detection, not after investigation.

The response sequence

Containment comes before investigation. Do not spend time on root cause while the system is still running and producing harmful outputs. The containment decision — pause, restrict, or continue with enhanced monitoring — should be made within hours of a confirmed incident, not days. Pre-assign who makes this decision for each high-risk system in your incident response plan.

Investigation should be systematic and documented. The root cause analysis needs to answer: what specifically went wrong, when did it start, how many decisions or outputs were affected, and why did existing monitoring not catch it earlier. That last question is as important as the first. A monitoring system that failed to detect an incident is itself a finding that requires remediation.

Regulatory notification obligations

Several regulatory frameworks impose AI incident notification obligations. The EU AI Act requires providers of high-risk AI systems to report serious incidents and malfunctions to national market surveillance authorities without undue delay. For incidents involving personal data, GDPR breach notification requirements (72 hours to the supervisory authority for high-risk breaches) run concurrently. Sector-specific regulators in financial services and healthcare have their own notification frameworks that may apply depending on the nature of the incident.

Build a notification decision tree into your incident response runbook. The decision tree should identify, for each incident type and severity, which notification obligations are triggered, to whom, within what timeframe, and who has authority to make the notification. Notification obligations often have short deadlines — the 72-hour GDPR clock starts running when you become aware of a breach, not when you finish your investigation.