Blast Radius

Blast radius is the scope of impact when a system component fails: which users are affected, which dependent services degrade, and how far the failure propagates across the broader environment before containment.

Assessing blast radius is the first job of an incident response team. Before deciding what to do about a failure, responders need to know what's actually broken: which user-facing functionality is degraded, which customer segments are affected, which downstream systems are seeing knock-on effects. A small blast radius (one feature, one customer cohort, one region) calls for a different response than a large one. Getting this assessment wrong in either direction is costly. Underestimating blast radius leads to under-escalation and prolonged customer impact. Overestimating it leads to unnecessary paging, war room sprawl, and rollback decisions that introduce more risk than they remove.

In modern distributed environments, blast radius is harder to assess than it used to be. A single failure can surface as symptoms several service boundaries away from where the failure actually originated, and the path of propagation depends on which other systems happened to be calling the failing component at the time. A failure in an identity service might affect checkout, search, and notifications simultaneously while leaving payments untouched, and the pattern can change minute to minute as different traffic patterns hit the affected component. Manually assembling this picture during a live incident is one of the slowest and most error-prone parts of incident response.

An AI SRE with a real-time Production World Model™ can produce a blast radius assessment automatically as part of its initial investigation. Because the system already knows the dependency graph, traffic patterns, and recent changes, it can identify in seconds which services and user populations are affected by a given failure, and which are not, despite alerting. The capability shifts the first job of the responder from "figure out what's broken" to "validate the AI's assessment and decide what to do about it." Blast radius assessment becomes a structured output, not a manual investigation.