Estimated reading time: 8 minutes
How to safely introduce AI and agentic capabilities into your on-call processes.
That third page at 3am is exhausting, but it is more than a nuisance; it’s a signal that your incident response orchestration is broken.
While most teams blame noisy alerts, the real bottleneck is the manual effort required for humans to stitch together disparate data points to react effectively.
Google SRE teams aim to limit on-call bandwidth to 25% per engineer to prevent burnout and ensure a sustainable workload.
The percentage of time spent on active on-call duties is part of a strict 50/25/25 balance: at least 50% of an SRE‘s time must be dedicated to engineering projects that scale the service, while the remaining 50% is split between other operational tasks and active paging response.