Critical Infrastructure AI Risk Assessment and Containment
Define a sector-specific risk assessment process for AI systems deployed in critical infrastructure environments — including energy, water, transportation, and financial market infrastructure — that addresses operational technology (OT) blast-radius containment, consequence-of-failure analysis, and cross-sector dependency risk distinct from standard enterprise AI risk frameworks.
Objective
Prevent AI system failures from cascading into critical infrastructure disruptions or physical harm, by requiring consequence-of-failure analysis, OT environment isolation controls, and government coordination protocols before AI is deployed in systems where failure could affect public safety or national security.
Maturity Levels
Initial
AI systems deployed in operational technology environments are assessed under the same enterprise AI risk framework as IT-environment deployments, without specific consideration of OT blast radius, physical consequence of failure, or sector-specific regulatory requirements.
Developing
CISO or OT security team reviews AI deployments in OT environments for cybersecurity risk, but consequence-of-failure analysis and cross-sector dependency assessment are not systematically conducted. Sector-specific regulatory reporting obligations are partially understood.
Defined
A critical infrastructure AI risk assessment process defines: consequence-of-failure categories (safety, reliability, national security), OT network isolation requirements for AI systems with control authority, blast-radius containment controls specifying what assets an AI system can affect if it fails or is compromised, and sector-specific regulatory reporting obligations. All AI systems with OT control authority or safety-system integration are assessed under this process.
Managed
Blast-radius assessments are updated when the AI system's integration scope changes or when OT topology changes. Cross-sector dependency analysis is conducted for AI systems whose failure could cascade beyond the immediate operational domain. AI systems in critical infrastructure are included in OT incident response exercises. CISA reporting obligations for AI-related incidents in covered sectors are tracked and implemented.
Optimizing
Critical infrastructure AI risk assessments are coordinated with sector-specific Information Sharing and Analysis Centers (ISACs) where applicable. The organization participates in government-industry AI resilience working groups. Risk assessments are updated to reflect evolving threat intelligence from CISA and sector ISACs.
Evidence Requirements
What an auditor or assessor would expect to see for this control.
- —Consequence-of-failure analysis for each AI system with OT control authority or safety-system integration, documenting failure scenarios and maximum harm assessments.
- —OT network isolation architecture documentation showing AI system connectivity, data flows, and separation from control networks.
- —Blast-radius assessment defining which physical assets and processes can be affected by AI system outputs, with containment controls for each.
- —Sector-specific regulatory obligation mapping for AI systems in scope for NERC CIP, AWIA, or equivalent sector regulations.
- —OT incident response exercise records showing critical infrastructure AI failure scenarios were included.
Implementation Notes
Why critical infrastructure AI requires a separate risk framework
The fundamental difference between AI in critical infrastructure and AI in enterprise settings is the consequence of failure. When an enterprise AI system fails:
- Productivity is lost
- Decisions may be wrong
- Data may be exposed
- Compliance may be violated
When an AI system in critical infrastructure fails:
- The power grid may lose stability
- Water treatment may produce unsafe output
- Transportation systems may route incorrectly under emergency conditions
- Financial market infrastructure may be disrupted in ways that cascade across markets
This consequence asymmetry requires a different risk framework — one that begins with consequence-of-failure analysis rather than data classification or regulatory compliance.
Consequence-of-failure analysis
For every AI system considered for deployment in a critical infrastructure environment, the risk assessment must answer:
If this AI system fails completely (stops functioning):
- What operational processes are affected?
- What is the time-to-impact (how quickly does failure propagate to operations)?
- What human backup processes exist and how long can operations continue manually?
- Is there a safety system that takes over, and does the AI interact with that safety system?
If this AI system produces incorrect output (outputs the wrong recommendation or action):
- What is the maximum harm from a single incorrect output?
- Is the incorrect output immediately detectable by operators?
- How much time exists to detect and correct before the incorrect output becomes irreversible?
- Can the system be placed in manual override mode and how quickly?
If this AI system is compromised by an adversary:
- What is the worst-case scenario if the AI is directing the system to produce maximum harm?
- Is the AI system's output validated by any independent check before it affects physical systems?
- What blast radius is available to the adversary through this AI system?
OT network isolation and blast-radius containment
AI systems with any form of control authority in OT environments must be assessed for blast-radius: the set of physical assets and processes that can be affected by the AI system's outputs, whether through direct control or through advisory outputs that operators are likely to follow.
Blast-radius containment principles:
-
Network segmentation: AI systems should not have direct network connectivity between OT control networks and IT networks. AI processing should occur on a dedicated AI inference node that receives OT data via one-way data diode or highly constrained interface; outputs should be validated before reaching control systems.
-
Output validation gates: AI system outputs that directly or indirectly affect physical system state should pass through an independent validation check before execution. This validation should be based on engineering constraints (setpoint limits, rate-of-change limits, safety system state) rather than another AI model.
-
Autonomy limits in OT: AI systems in OT environments should default to advisory output (operator must confirm) for any action above a defined consequence threshold. Autonomous control authority should be limited to low-consequence, high-frequency operational decisions that are individually reversible.
-
Fail-safe defaults: AI system failure (loss of output, output timeout, error state) should drive the controlled process to a safe state, not to the last-known-good recommendation. Safe state defaults are defined by the OT engineering team and encoded in the control system independent of the AI system.
Sector-specific regulatory considerations
Energy (NERC CIP): NERC CIP standards govern cybersecurity for bulk electric system assets. AI systems that process or control BES assets may be in scope for CIP-005 (electronic security perimeters), CIP-007 (systems security management), and CIP-010 (configuration change management). CIP compliance obligations apply to the AI system infrastructure if the AI system is associated with BES Cyber Assets.
Water (AWIA 2018): America's Water Infrastructure Act requires risk and resilience assessments and emergency response plans for community water systems. AI systems involved in treatment control, distribution monitoring, or emergency response should be included in AWIA risk assessments.
Financial market infrastructure (SEC, CFTC, FINRA): AI systems used in trading, clearing, or market surveillance are subject to change management and business continuity requirements from multiple regulators. AI system failures that affect market operations may require regulatory notification.
Cross-sector dependency: Many critical infrastructure AI failures are not self-contained. An AI system managing grid frequency control may fail because of a dependency on a cloud-based weather data feed. The risk assessment must map these dependencies, including third-party AI components and cloud infrastructure dependencies, to identify cascade scenarios.
Example Implementation
Critical Infrastructure AI — Blast-Radius Assessment (excerpt)
System: Load Forecast AI — Distribution Grid Optimization OT environment: Electric distribution control (SCADA interface) Assessment date: [Date] | Assessed by: OT Security + AI Governance
Consequence-of-failure analysis:
| Failure mode | Affected systems | Time to impact | Human backup | Maximum consequence |
|---|---|---|---|---|
| Complete system failure (no output) | Dispatch software receives no forecast; operators must use manual estimation | Immediate — dispatch relies on 15-min forecasts | Manual estimation feasible for 4–6 hours; degraded accuracy | Suboptimal dispatch decisions; potential reliability violations under high-load conditions |
| Systematically incorrect output (e.g., 20% underforecast) | Dispatch schedules insufficient generation reserves | 15–60 min (operators may not detect until load spike) | Manual override if operators identify discrepancy | Load shedding in affected distribution zones (up to 50,000 customers for 60–90 min) |
| Adversarial manipulation (output drives overloading) | Control room acts on incorrect forecast; transformer overload possible | 30–90 min if not caught | Control room operator monitoring protects against immediate damage | Transformer damage possible in worst case; restoration time 2–4 weeks |
Blast-radius assessment:
Current: AI output reaches dispatch software via validated API. Dispatch software enforces engineering setpoint limits before control action. AI has no direct control authority. Blast radius: advisory only — cannot cause physical damage without operator action and control system setpoint override.
Required containment controls:
- Output validation: dispatch software enforces setpoint limits independent of AI output
- Rate-of-change limits: dispatch software rejects AI recommendations exceeding ±15% from prior interval
- Fail-safe default: AI timeout → dispatch defaults to conservative reserve margin (engineering-defined)
- Anomaly detection: monitor AI output distribution for statistical anomalies indicating manipulation — NOT YET IMPLEMENTED. Required by Q3 2026.
NERC CIP scope determination: AI system is not classified as a BES Cyber Asset. Network connection to SCADA system is via unidirectional data feed (read-only OT data in; advisory output to non-critical dispatch software). CIP-005 perimeter: AI system resides in Electronic Security Perimeter; subject to CIP-007 patch management and CIP-010 change management. CIP team informed.
