Continuous improvement is one of the fundamental tenets of Agile methodology that PagerDuty’s product development teams emphasize. This already works fairly well at the individual...by Simon Darken
August 15, 2019
Most technical incident response plans typically account for stakeholder communications—for both internal teams and external customers. But at PagerDuty, what we’ve learned from our customers is that there’s still a painful and expensive gap in alignment between IT and business teams. To close that gap, we need to focus on what incident response means for business teams.
Product solutions can help address some of the challenges of stakeholder communications. But to help further support effective response across an organization, we also need process-oriented solutions to help change the ways we work today. We’ve developed two new operational guides that provide frameworks to help you create clear action plans that mobilize business teams in alignment with technical teams when incidents occur.
Is your CEO on call when your web services go down?
No? Well, they should be!
Perhaps that is an extreme example, but it’s true that the business also has incident responders (the CEO is one in the above example), regardless of whether they’re currently treated as such in your organization. But the problem is that those business stakeholders typically act on their own, even though they would greatly benefit from coordinating their response with technical responders.
There’s no question that technical incidents have a quantifiable impact on your company’s revenue and reputation. After all, when business services are disrupted, customers become upset. This is most noticeable when disruptions are highly visible over prolonged periods.
For example, do you remember a time when a critical office productivity application was unavailable for a few hours? Or that one morning when your favorite social networking site was super slow? Without a proper response in place, events like that can cause reputational damage and significant revenue loss to a company—and to help mitigate the damage from both, business stakeholders rely on updates from technical incident responders so they can proactively respond to customer concerns.
If technical incident response has taught our industry anything, it’s that the key to effectively navigating disasters involves explicitly defining response plans, with clear roles and responsibilities that are frequently practiced. In short, we have to prepare for failures by constantly rehearsing for and learning from them when they occur.
Business incident response is an application of PagerDuty’s incident response framework for non-technical responders. Using this framework, business responders can develop a set of explicit steps they can take to mitigate business impact when technical incidents happen.
Just like with technical incidents, business response varies depending on the severity of an incident. Minor incidents may simply require more verbose stakeholder communications and notifications, whereas major incidents may require waking up your executive and legal teams in the middle of the night.
Those major incidents occur with little to no warning. So the question is: should you wait for a middle-of-the-night surprise to figure out what critical decisions need to be made to mitigate material brand damage to your company? Or should that response be prepared and reviewed with critical stakeholders, rehearsed to identify and improve gaps, and have clear on-call rotations and escalation policies defined months in advance?
I’m going to wager most people see the value of having defined response plans with clear roles, expectations, and owners ahead of time, especially during a crisis.
Our new Business Incident Response operational guides provide a framework to develop clear action plans that mobilize business teams in alignment with technical teams when incidents occur. Whether managing major or minor incidents, you need to have a plan for how the business will respond.
Our new operational guides provide a framework to develop clear action plans that mobilize business teams in alignment with technical teams when incidents happen. These two guides will help close the response gap between IT teams and business teams.
Our existing technical incident response guide defines responsibilities for a Customer Liaison and an Internal Liaison during an incident. The new Internal Stakeholder Communications guide is a supplemental handbook that more explicitly fleshes out effective communications practices for your internal teams. The guide details distribution mechanisms and messaging considerations for different mediums, such as using mechanisms that allow for flexible opt-in to receiving notifications (e.g. an internal status page) as opposed to more static mechanisms like email distribution lists. It also covers considerations for presenting relevant views to different parts of the business.
The Business Incident Response guide focuses on mitigating business impact when responding to the most severe technical incidents. The guide delves into the business response operations that should occur in parallel to technical response operations. Business incident response consists of two primary functions: 1) transitioning normal business operations into emergency mode and 2) managing internal/external stakeholder communications. The guide is an in-depth framework that defines key roles and procedures both during and after a severe technical incident, and is meant to help you develop a business response plan to (hopefully) rare extreme case disasters.
These guides are meant to help you get started in developing business response practices that align with the tried-and-true practices developed by technical practitioners. And while they cover two particularly painful scenarios that many organizations struggle with, we recognize that this is just the tip of the iceberg when it comes to defining the many different incident response scenarios applicable to the business side of any company.
What other pain points do you struggle with? How does business incident response work in your organization today? We’d love to hear from you and get a chance to share and compare experiences that help develop these guides further. Stop by our Community to drop us a comment or reach out to me directly via Twitter, and let us know what you think!