Kill-Switch Propagation Testing
Regularly test that halt commands propagate correctly through all subagent layers and parallel orchestration environments, stopping all agent activity within a defined time window.
Objective
Ensure the emergency halt capability established in AGT-008 actually works in complex multi-agent deployments where stop signals must traverse delegation chains and concurrent execution paths.
Maturity Levels
Initial
Kill-switch exists but propagation to subagents is untested; behavior in parallel orchestration is unknown.
Developing
Kill-switch propagation tested manually for simple single-agent deployments but not for multi-agent or parallel configurations.
Defined
Propagation tests are documented and executed for all production agent topologies on a scheduled basis; results are recorded.
Managed
Propagation latency is measured and tracked against an SLA; failures trigger remediation with a documented root cause.
Optimizing
Propagation tests run automatically on every topology change; latency budgets are enforced in CI/CD before agent deployment.
Evidence Requirements
What an auditor or assessor would expect to see for this control.
- —Agent topology map documenting all agent relationships, delegation depths, and parallel execution paths in production
- —Kill-switch propagation test plan with defined SLA (maximum latency from halt command to full cessation)
- —Executed test results showing propagation latency, pass/fail status, and coverage of all production topologies
- —Remediation records for any propagation failures, including root cause and fix verification
- —Change-trigger log showing propagation tests were re-executed after topology changes
Implementation Notes
Key steps
- Map every agent topology in production: document which agents can spawn subagents, which run in parallel, and the maximum delegation depth.
- Define a propagation SLA: the maximum elapsed time from halt command issuance to confirmed cessation of all agent activity across all nodes.
- Write test cases that cover: (1) single-agent halt, (2) orchestrator-to-subagent propagation, (3) parallel agent halt, (4) halt during an active tool call or irreversible action.
- Run tests on a scheduled cadence (at minimum quarterly, or after any topology change) and record pass/fail, measured latency, and any agents that failed to halt.
- For agents that interact with external systems, verify that in-flight API calls are either completed cleanly or rolled back — not left in an ambiguous state.
Example Implementation
Enterprise deploying a customer-service orchestrator that spawns research, drafting, and CRM-update subagents
Kill-Switch Propagation Test Results — Q2 2026
Topology tested: Orchestrator → [Research Agent, Drafting Agent] → CRM Update Agent
| Test case | Halt command issued | All agents stopped | Latency | In-flight actions |
|---|---|---|---|---|
| Single agent (Drafting) | 14:02:01 | 14:02:01 | 0.3s | Completed cleanly |
| Orchestrator → 2 subagents | 14:05:00 | 14:05:02 | 2.1s | Research cancelled, Drafting completed |
| Halt during CRM write | 14:08:30 | 14:08:32 | 1.8s | CRM write rolled back |
| Parallel execution (3 agents) | 14:11:00 | 14:11:03 | 2.7s | All stopped |
SLA: 5 seconds. All tests passed. Finding: CRM rollback requires manual verification — add automated confirmation check.
