A terrible, horrible, no-good, very bad day at Slack

What can we learn from outages when they happen?

By Laura Nolan

June 24, 2021

Outages are awful for users and teams alike. What can we learn from them when they happen?

On May 12, 2020, Slack had its first significant outage in a long time. We published a summary of the incident shortly after, but this story is an interesting one, and we’d like to go into more detail on the technical issues around it.

The user-visible outage began at 4:45pm Pacific time, but the story really begins around 8:30am that morning. Our Database Reliability Engineering team was alerted about a significant load increase in part of our database infrastructure at the same time as our Traffic team received alerts that we were failing some API requests.

Join LeadDev.com for free to access this content

Create an account to access our free engineering leadership content, free online events and to receive our weekly email newsletter. We will also keep you up to date with LeadDev events.

We have linked your account and just need a few more details to complete your registration:

First name Last name Job title Company Country

Terms and conditions I agree to the LeadDev.com terms and conditions of use

Create a password

About the author

Laura Nolan
- @lauralifts
- Laura Nolan

Newsletters

Panel discussions

Videos

Reports

For you

London

Meetups

New York

Berlin

A terrible, horrible, no-good, very bad day at Slack

By Laura Nolan

Join LeadDev.com for free to access this content

About the author

Laura Nolan

London

Meetups

New York

Berlin

A terrible, horrible, no-good, very bad day at Slack

By Laura Nolan

Join LeadDev.com for free to access this content

Share:

About the author

Share:

More like this