On-call

On-call is the rotational duty in which engineers take responsibility for responding to production incidents during scheduled time windows, typically receiving pages for alerts, investigating issues, and either remediating or escalating as needed.
On-call is one of the foundational practices of SRE and one of the most consistent sources of engineer burnout. A typical rotation places one or more engineers on call for a week at a time, during which they must respond to alerts within tight SLA windows (often 5-15 minutes), regardless of hour. The work is high-variance: most shifts are quiet, occasional shifts involve hours of intense investigation, and the unpredictability itself is a tax on the engineer's ability to plan their week.
The quality of incident response in traditional on-call models is heavily dependent on who is on call. A tenured engineer who has personally debugged three prior incidents involving the same dependency chain will resolve the incident faster and with fewer wrong turns than an engineer who joined eight months ago and has never seen this failure mode. That gap is not a training failure; it is a structural property of how operational knowledge is currently stored, which is primarily in human memory.
AI SRE changes the structure of on-call by externalizing operational and production knowledge into a queryable Production World Model™. The engineer paged at 3am now has access to the same context as the most experienced person on the team; not because they've been trained to the same depth, but because the knowledge is no longer locked in someone else's head. Organizations can staff on-call more broadly without accepting degraded quality, new team members reach productive contributions in weeks rather than months, and the most experienced engineers spend fewer nights reconstructing dependency graphs the system already knows.