New York

October 15–17, 2025

Berlin

November 3–4, 2025

London

June 2–3, 2026

A terrible, horrible, no-good, very bad day at Slack

What can we learn from outages when they happen?
June 24, 2021

Outages are awful for users and teams alike. What can we learn from them when they happen?

On May 12, 2020, Slack had its first significant outage in a long time. We published a summary of the incident shortly after, but this story is an interesting one, and we’d like to go into more detail on the technical issues around it.

Slack advert

The user-visible outage began at 4:45pm Pacific time, but the story really begins around 8:30am that morning. Our Database Reliability Engineering team was alerted about a significant load increase in part of our database infrastructure at the same time as our Traffic team received alerts that we were failing some API requests.

Join LeadDev.com for free to access this content

Create an account to access our free engineering leadership content, free online events and to receive our weekly email newsletter. We will also keep you up to date with LeadDev events.

Register with google

We have linked your account and just need a few more details to complete your registration:

Terms and conditions

 

 

Enter your email address to reset your password.

 

A link has been emailed to you - check your inbox.



Don't have an account? Click here to register