AI Governance Institute logo
AI Governance Institute

Practical Governance for Enterprise AI

Change Management
CHM · Change ManagementCHM-003Medium effortAgent-relevant

Model Rollback and Emergency Shutdown

Maintain tested procedures to rapidly revert an AI system to a prior version or disable it entirely in response to detected failures or safety events.

Objective

Limit harm from AI system failures by ensuring rollback and shutdown capabilities exist, are tested, and can be executed quickly.

Maturity Levels

1

Initial

No rollback or shutdown procedures exist; recovery from failures requires unplanned engineering work.

2

Developing

Rollback is technically possible but procedures are not documented or tested.

3

Defined

Documented rollback and shutdown procedures are in place with defined decision criteria and responsibility assignments.

4

Managed

Procedures are tested quarterly; rollback time is measured; results inform procedure improvements.

5

Optimizing

Rollback can be triggered automatically based on monitored thresholds; shutdown requires only one authorized action.

Evidence Requirements

What an auditor or assessor would expect to see for this control.

  • Rollback procedure documentation with steps, role assignments, and target recovery time objective
  • Rollback test or drill records confirming the procedure has been exercised and the RTO was met
  • Emergency shutdown configuration and authorization matrix showing who may trigger shutdown under which conditions
  • Post-rollback review records for any rollback events in production, including root cause and prevention actions
  • Recovery time measured from most recent test or actual event, compared against the defined RTO

Implementation Notes

Key steps

  • Define the shutdown decision criteria before you need them: what metric threshold, incident type, or regulator instruction triggers immediate shutdown?
  • Test rollback procedures regularly — discovering that rollback takes 6 hours when you expected 30 minutes is the wrong time to find out.
  • Assign rollback authority explicitly: who can authorize an emergency shutdown without normal approval chains?
  • For customer-facing AI systems, prepare a communication template for notifying users when AI functionality is disabled.

Example Implementation

Customer-facing AI assistant with a documented incident response plan

Rollback and Shutdown Runbook — Customer AI Assistant

Shutdown decision criteria (any of the following):

  • Error rate > 10% sustained over 5 minutes
  • Confirmed prompt injection producing harmful outputs
  • Regulatory direction to suspend operation
  • AI incident classified P1 (see IRC-001)

Rollback authority: Engineering on-call (immediate), or AI Governance Lead / CISO (after hours) Shutdown authority: Any two of: CTO, CISO, AI Governance Lead — single approver permitted for P1 incidents outside business hours

Rollback procedure:

  1. Confirm rollback target version in model registry
  2. Execute: ./scripts/ai-rollback.sh --env prod --target <version>
  3. Verify: run smoke test suite against rolled-back version
  4. Notify: post to #ai-incidents Slack channel with version, time, reason
  5. Estimated time to complete: 8 minutes (last tested 2026-03-01, actual: 7 min 22 sec)

User communication template: Pre-drafted in /runbooks/ai-outage-comms.md — covers in-app message and status page update

Control Details

Control ID
CHM-003
Typical owner
AI Engineering / CISO / Operations
Implementation effort
Medium effort
Agent-relevant
Yes

Tags

rollbackemergency shutdownincident responsebusiness continuity