PagerDuty Blog

Better Incident Postmortems

While a major incident is ongoing, all of your focus is on restoring service: watch the smoke, figure out where the fire is, and put it out. But after service has been restored—the incident is resolved, the adrenaline has drained, and it’s peace time—that’s the time to learn from what happened and then use those learnings to get better at resolving, responding, and preventing future incidents. The core best practice that enables this cycle of improvement is the postmortem process, and PagerDuty is pleased to introduce integrated support for postmortems in our full lifecycle incident management platform! Coupled with several other PagerDuty capabilities, such as system and operational efficiency analytics and the Operations Command Console, we now provide everything you need to learn and proactively improve both the resiliency of your infrastructure and your resolution process.

PagerDuty improves all parts of the postmortem process, from building the timeline all the way through to tracking the status of postmortems. Construct a timeline with relevant PagerDuty and chat activity in minutes instead of hours, then use that detailed breakdown to efficiently investigate root cause, assess response effectiveness, and determine the most important follow-up actions. We’ve taken the friction out of conducting effective postmortems, so that more of your postmortem time can be focused on learning and less on manual work. How easy can your postmortems be? Let’s take a look!

Now you can kick off the postmortem process for an incident in a single click:

Investigate

With the postmortem report created, it’s time to roll up our sleeves and start investigating what actually happened. We’ll want to pull in activity from our already existing sources of communication and incident response: chat and PagerDuty. Our PagerDuty incident information was automatically associated with our new postmortem, so let’s add in the relevant chat channels:

Now we can review the combined activity available from the incident and these chat rooms, and include in the postmortem timeline exactly those bits that are most relevant to understanding how the incident played out. We want to cover several aspects of the incident: the technology systems involved, our response effectiveness, and resolution steps.

Postmortem Timeline

Including an item in the postmortem timeline is also just a single click—no cut and paste, no switching between applications, no error-prone and manual time-zone math. The full range of PagerDuty activity can be included: incident state changes, notes, escalations, notifications, when additional responders were requested, when status updates were dispatched to stakeholders, and more. Once the activity is in the timeline, you can also annotate to describe its relevance to the incident, as seen here:

Analyze

With the timeline built out, we can continue on to the analysis phase. This consists of summarizing what happened, identifying the underlying root cause, calling out the path to resolution, and so on. This step is key as it enables the team to introspect on what worked well and where we could have done better, then identify the most important improvements to pursue as action items. All of this is easy to capture within the postmortem editor, which also provides instructions for approaching each of these sections:

And it’s as simple as that!

Streamline Postmortem Management

Not only is individual postmortem construction easier and more effective, the overall process is also significantly streamlined. All postmortems are available in the catalog.

This makes it easy to locate postmortems, identify impactful long-running incidents, and see which postmortems are still in progress, or are already complete. Postmortems can also be exported as PDFs for distribution or archiving, and both the report template and per-section instructions for authors can be customized to fit the needs of your organization. Together, all of these tools provide a complete end-to-end postmortem process that is both easy to use and easy to manage.

This suite of functionality helps you get the most from postmortems:

  • Timeline building is faster, less painful, and enables broader insights.
  • It’s far easier to manage the postmortem process with a simplified toolchain.
  • Your team can accelerate continuous improvement by getting more and better learnings, while spending less time on the process.

We hope that this capability makes it as easy as possible for your team to facilitate a culture of shared learning. And if you’re interested in learning more, download our free post-mortem handbook for best practices on conducting effective postmortems.

PagerDuty Postmortems is included for all customers on our Standard and Enterprise plans. To get started, check out the support article here!