Why We Use On-Call Shadowing On-call shadowing is an essential practice at PagerDuty. For a new engineer, a shadowing period serves as a kinder, smoother...by Max Timchenko
March 26, 2019
In part 2 of our postmortem series, we dig into how to establish a culture of continuous learning, from getting leadership on board to invoking a cultural shift.
Culture is the way we do things together. It’s the secret sauce that results in happy, healthy teams that consistently meet their goals. It’s also the hardest thing to define, cultivate, and change in an organization. True cultural change requires more than creating and communicating policies. It takes collaboration, persistence, and experimentation.
At PagerDuty we’re big fans of Agile methodologies and DevOps practices. We’ve applied the tenet of continuous improvement beyond software development to culture change, too. You can shift your culture in the right direction through iterative assessment and collaboration.
Which brings me to postmortems, another core DevOps practice. A successful postmortem is more than just a process—it’s based on a culture of honesty, learning, and accountability. Culture change requires management buy-in, but you can lead culture change no matter your role.
Before embarking on a total overhaul of your company’s processes, it’s important to take stock of where you’re starting from. Do you currently use a postmortem process to debrief after incidents? What are the steps you follow? Who is involved? How do conversations about failure typically go? If your team has had these discussions, then you may be on your way to transitioning into a culture of blameless postmortems.
Many companies will have a meeting after a major incident to review what happened. You may find that in these discussions a few individuals end up taking the brunt of the responsibility for the incident. Generally, everyone walks away knowing a bit more, hopefully with a plan for how to avoid the issue in the future. More importantly, a few people walk away feeling pretty crummy, which you want to avoid.
Take a look at our our step-by-step guide for performing postmortems to give your team a quick audit. What are you currently doing, what do you currently do but may need to tweak, and where are you not doing anything?
Whether you’re introducing postmortems as an entirely new practice at your organization or working to improve an existing process, culture change is hard. Traditionally, we think of changes being driven down by management, but bottom-up changes are usually more successful. No matter what your role is, the first step to introducing a new process is getting buy-in from leadership and individual contributors.
Approaching leadership with thoughtful reasoning for this shift will help reinforce the importance and impact this change can have. Here are some key talking points for that conversation:
Additionally, though it may sound silly, be sure not to blame management for blaming when selling a new blameless postmortem process to them.. Instead, emphasize that practicing blamelessness is difficult for everyone involved. Ensure that your leaders are on board with the team helping everyone be accountable to the new process by calling each other out when blame is observed in response to failure. It’s important to get leadership’s affirmation that they will be receptive to receiving that feedback if and when they accidentally suggest blame after an incident.
Your objective is to secure leadership’s commitment to a culture of continuous improvement.
Pro tip: See if you can map the concept of blamelessness to a company value. For instance, one of our cultural values at PagerDuty is specifically around embracing disruption and continuous improvement, and always focusing on learning. This concept of a blameless postmortem can directly map to supporting those values.
Now that leadership is in, you’re well on your way to implementing a major cultural change at your organization! The next step is to secure buy-in from the individual contributors on your team. Keep in mind, however, that they may still be fearful of being blamed for incidents. That fear will not dissolve with the wave of the policy wand. Be sure to share that you have commitment from management that no one will be punished in any way after an incident. Build trust with your colleagues by agreeing to work together to become more blame aware and kindly call each other out when blame is observed.
(Read more about the importance of psychological safety within groups.)
Culture change does not happen overnight. Iteratively introduce new practices to your organization by starting small—for example, by sharing successful results of experimenting with new practices, then slowly expanding those practices across teams.
How to get started:
Remember: Keep it simple at first to build the blameless confidence of the group. Experiment with what works best for the team and iterate on your next round.
Instigating a cultural shift for an organization, or even for a single team, takes significant energy. Change can sometimes be rebuked for the mere perception of it “being hard” before it is tried, tested, and ultimately embraced.
To overcome this, organizations should leverage the natural power of information sharing to spread change. According to Puppet’s 2018 State of DevOps Report, operationally mature organizations adopt practices that promote sharing. Humans naturally want to share their successes, and when others see something that’s going well, they instinctively want to replicate that success.
Sharing incident reports may initially seem counterintuitive—it seems like you’re sharing a story of failure rather than success. Quite the contrary: Practicing blameless postmortems allows teams to learn from failure and improve systems to reduce the prevalence of failure.
Reframe incidents as learning opportunities that result in concrete improvements rather than as someone’s personal failure. This increases morale, which in turn increases employee retention and productivity.
By freely sharing information and encouraging transparency you are supporting an environment that cultivates accountability. What happens after the postmortem is key to the health of the system. Setting up an SLA for when postmortem action items are expected to be completed will help the team quickly assign and prioritize tasks. It will also empower the team to jump into action without needing to wait for permission.
Pro-tip: Be sure to communicate this SLA to all of engineering and make sure it is documented for future reference.
At PagerDuty, our VP of Engineering has set the expectation that high-priority action items needed to prevent a Sev-1 incident from recurring should be completed within 15 days after the incident. Action items from a Sev-2 incident should be addressed within 30 days.
Changing your company’s culture for good is some of the hardest work to take on. It’s incredibly nuanced, requires a high level of empathy, and can be emotionally exhausting. It’s also some of the most important and rewarding work to do for your organization as promoting a blameless culture of continuous learning leads to happier teams and better software.
You can shift your organization in the right direction by applying concrete steps of assessment, collaboration, communication, and experimentation:
To learn more about how to adopt blameless postmortems, check out our comprehensive Postmortem Guide. We’d love to hear how you approach culture change and spread blamelessness. Head to the forums and share with your community!