Site reliability engineers apply software techniques to operations to maximize uptime and avoid costly outages. But is this approach right for your organization?
Engineering leaders are often judged on the uptime of the systems and applications their teams build and maintain. It doesn’t matter how cool your feature set is: if your site or application goes down or the response time is unreasonably slow, the C-suite won’t be happy.
Traditionally, maintaining site uptime and performance was a job for the dedicated operations team, but recently those roles have been largely subsumed under the philosophical umbrella of DevOps.
Now, the modern discipline that keeps applications running and responsive is known as site reliability engineering (SRE), which applies the latest software development and automation techniques to the task of maintaining maximum uptime.