How much time do you have? Perhaps it’s our culture of impatience, the art of being a manager, or maybe just my old age, but I’ve begun to really appreciate the people around me who can communicate the current state of the world in a concise way. That next level of detail just isn’t always necessary.
Responding to critical incidents is a race against time. To start, you’re often trying to determine who is on that critical path to resolution. As more people get involved with the incident (thanks to Response Mobilizer), the overall amount of communication increases rapidly, especially if you’re using chat as a primary collaboration medium. Reinforcements also don’t arrive all at once. Your effectiveness in driving resolution as an Incident Commander is predicated on how quickly you’re able to ramp-up new responders on the current state of the world. Formally, this is known as establishing common ground. (Dan Slimmon of Exosite has a fantastic talk from Velocity Santa Clara 2016 on this exact topic if you’re interested.)
Many organizations are augmenting incident response with chat as a way to communicate during incidents. Like conference bridges, chat is great for interaction; and like conference bridges, chat is a horrible choice for establishing that common ground for newly-joined responders. If this is how you do incident response, you’re likely wasting precious time and extending your TTR.
Why Your Chat Timeline is Ineffective
Many of our customers rely on chat for communication during a large-scale incident to encourage collaboration and transparency. However, an anti-pattern has begun to emerge around chat-driven incident response. For large-scale incidents, the cost of telling responders to “read the backlog” in order to ramp-up on the incident is much more than you think. What does a newly-joined responder really need to know?
- What is failing (e.g. known symptoms)? What is the known impact to customers?
- Who is currently engaged (e.g. who is the Incident Commander)?
- What actions have been taken? What decisions have been made? What hypotheses exist that explain what is happening?
You can certainly get this info from your chat timeline (or even by interrupting and asking everyone on the bridge!), but there’s gotta be a better way.
Notes for speed
Today, we are introducing a new capability to more easily surface incident notes for capturing key actions or decisions made related to a large-scale incident.
These notes can even be captured easily via the mobile app:
In our incident response at PagerDuty, we assign a dedicated Incident Scribe as one of our critical roles (in addition to Incident Commander). This person is responsible for the important “play-by-play” about the incident which provides a simple way to ramp-up new responders who engage on an in-flight incident. The scribe notes are also the best resource to build a curated timeline for post-mortems.
No more noisy chat timelines. Give this new feature a shot and let us know what you think!
Sign up for PagerDuty for free.