AI Graceful Degradation

Define and implement fallback behavior for AI systems when they are unavailable, underperforming, or producing outputs below acceptable quality thresholds.

Objective

Maintain continuity of critical operations and prevent harm when AI systems fail by ensuring a defined, tested fallback path always exists.

Maturity Levels

Initial

No fallback exists; AI system failures cause process failures.

Developing

Fallback paths exist informally for some use cases but are not documented or tested.

Defined

Fallback procedures are documented for all AI-dependent processes, including manual process alternatives and user communication templates.

Managed

Fallback activation is tracked; recovery time is measured against defined SLAs.

Optimizing

Fallback triggers are automated based on monitored thresholds; degradation scenarios are tested quarterly.

Evidence Requirements

What an auditor or assessor would expect to see for this control.

—Degradation design documentation specifying fallback behaviors, trigger conditions, and user communication approach for each AI system

—Failover test records confirming fallback paths activate correctly under simulated failure conditions

—Degradation event logs showing instances where fallback mode was triggered in production

—User notification records confirming affected users were informed when AI capabilities were degraded

—Post-degradation review records for significant events, including root cause and time-to-recovery

Implementation Notes

Key steps

For every AI-dependent process, define what happens when the AI is unavailable: manual process, cached output, partial functionality, or graceful error?
Test fallback paths under realistic conditions before deployment — undiscovered fallback failures during incidents are costly.
For customer-facing AI, prepare user communications for degraded mode: be transparent about what is unavailable and what users can do instead.
Apply circuit breaker patterns for AI API calls: if an API returns errors above a threshold, fail fast to the fallback rather than queuing requests.

E-commerce platform using AI for product recommendations and customer support chat

Graceful Degradation Plan — AI Features

Feature	Degraded Mode	Trigger	User Communication
Product recommendations	Show rule-based bestsellers (pre-computed)	AI API error rate > 10% for 2 min	None — seamless fallback
Support chat (AI response)	Route all chats to human queue	AI API unavailable or latency > 8s	"We're connecting you with a support specialist"
Search ranking (AI-enhanced)	Standard text-search ranking	AI scoring service unavailable	None — seamless fallback
Order anomaly detection	Queue all flagged orders for manual review	Model unavailable	Internal alert to Fraud team

Circuit breaker config: AI API calls fail fast after 3 consecutive errors or timeout > 5s; circuit stays open for 60 seconds before retry

Fallback test schedule: Quarterly — each fallback mode is triggered in staging and verified to activate correctly; results logged in degradation test register

Communication template: Pre-drafted status page and in-app banner for extended AI outage (> 30 min) stored at /runbooks/ai-degradation-comms.md

Maturity Levels

Evidence Requirements

Implementation Notes

Key steps

Example Implementation

Graceful Degradation Plan — AI Features

Control Details

Tags

Related Controls

Related Playbook

Recent Coverage