Turn any signal into insight and action. See how PagerDuty Digital Operations Management Platform integrates machine data and human intelligence to improve visibility and agility across organizations.
Learn how PagerDuty can accelerate your cloud migration.
Check out the latest features we've been working on — from event intelligence, machine learning, response automation, on-call, analytics, integrations, and more.
Digital Operations Management arms organizations with the insights needed to turn data into opportunity across every operational use case, from DevOps, ITOps, Security, Support, and beyond.
Over 200 Integrations
Discover DevOps best practices with our library of webinars, whitepapers, reports, and much more.
Learn best practices and get support help with resources from our award-winning support team.
See how PagerDuty works with our live product demo — twice a week, every week.
Join live and on-demand webinars for product deep dives, industry trends, configuration training, and use case-specific best practices.
Interactive, simple-to-use API and technical documentation enables users to easily try updates and extend PagerDuty.
Engage with users and PagerDuty experts from our global community of 200k+ users. Become a member, connect, and share insights for success.
Get all your PagerDuty-related questions answered by exploring our in-depth support documentation and community forums.
Virtual Summit is back and pre-registration is officially open! On January 23, we’re unlocking on-demand sessions to help you get operations right in 2018. RSVP to reserve your spot now and we’ll send you a reminder ...
PagerDuty helps organizations transform their digital operations. Learn more about PagerDuty's mission and what we do.
Meet our experienced and passionate executive team.
We are risk-taking innovators dedicated to delivering amazing products and delighting customers. Join us and do the best work of your career.
With the PagerDuty Foundation, we are committed to doing our part in giving back to the community.
A post-mortem (or postmortem) is a process intended to help you learn from past incidents. It typically involves an analysis or discussion soon after an event has taken place.
As your systems scale and become more complex, failure is inevitable, assessment and remediation is more involved and time-consuming, and it becomes increasingly painful to repeat recurring mistakes. Not having data when you need it is expensive.
The good news is, most organizations do have some kind of a post-mortem process in place to assess what happened once a service has been restored. Arguably, any resolution of an issue isn’t truly complete until a team has fully documented and reflected on it.
However, conducting a post-mortem can be a highly time-consuming task — teams often spend hours on each post-mortem trying to piece together the chronology of events from different sources of information.
Streamlining the post-mortem process is key to helping your team get the most from their post-mortem time investment: spending less time conducting the post-mortem, while extracting more effective learnings, is a faster path to increased operational maturity. In fact, the true value of post-mortems comes from helping institutionalize a positive culture around frequent and iterative improvement.
NOTE: Organizations may refer to the post-mortem process in slightly different ways. Other terms we’ve heard in the industry include:
The specifics around conducting post-mortems vary from organization to organization. Regardless of the process, the primary purpose of post-mortems should be learning, whether it’s about the systems being managed, the process being followed, or how the organization executes during a crisis. Additional goals, including identification and implementation of system or process improvements, may be realized depending on the process followed.
In general, an effective post-mortem report tells a story. Incident post-mortem reports should include the following:
Which services and customers were affected? How long and severe was the issue? Who was involved in the response? How did we ultimately fix the problem?
What were the origins of failure? Why do we think this happened?
What actions were taken? Which were effective? Which were detrimental?
Centralize key activities from chat conversations, incident details, and more.
What went well? What didn’t go well? How do we prevent this issue from happening again?
During incident response, the team is 100% focused on restoring service. They can not, and should not, be wasting time and mental energy on thinking about how to do something more optimally, nor performing a deep dive on figuring out the root cause of an outage. That’s why post-mortems are essential, providing a peacetime opportunity to reflect once the issue is no longer impacting users’ experiences. The post-mortem process drives focus, instills a culture of learning, and identifies opportunities for improvement that otherwise would be completely lost.
By forcing the team to explicitly dedicate time towards discussing and documenting lessons learned, while the incident is still fresh in their minds, the team is able to prioritize their focus on the right thing at the right time. The team does not sacrifice its ability to respond quickly in the midst of the fire, nor does it lose the opportunity to collaboratively understand how to improve its infrastructure and processes across every step of the response.
Post-mortems matter because learning together establishes the right culture around failing forward, with iterative and continuous improvement.
A blameless post-mortem is critical for understanding failures by trying to understand how a mistake was made, instead of who made the mistake. “You ignore the ‘this person did that’ part,” explains PagerDuty Engineering Manager Arup Chakrabarti. “What matters most is the customer impact, and that’s what you focus on.” This is a crucial tool leveraged by many leading organizations such as Etsy, a pioneer for blameless post-mortems, for ensuring post-mortems have the right tone, empowering engineers to give truly objective accounts of what happened by eliminating the fear of punishment.
Some make the argument that the blameless post-mortem might not seem possible because humans are hardwired for blame. They advocate “blame-aware” post-mortems in which teams acknowledge the instinct to blame, but focus their attention onto actionable takeaways instead.
Whichever terminology resonates with your team, the key point is that post-mortem discussions should be safe spaces in which teams can be completely honest and oriented around improving for the future instead of blaming others for the past.
PageDuty offers a completely free post-mortem handbook that shares industry best practices and includes a post-mortem template. Use it to help you formalize your own post-mortem process to make it as easy as possible for your team to respond to issues. Even better, post-mortems are now part of the PagerDuty platform — sign up for a free 14-day trial and streamline the entire post-mortem process with automated timeline building, collaborative editing, actionable insights, and more.
SIGN UP NOW