When incident response requires business response, who should you notify?
From a single on-call engineer hopping online to resolve a problem, to a massive cross-team effort that brings in even the most senior technical leadership (CTO, CISO, or CIO), incident response teams are lucky when they’re able to resolve issues before a customer is aware. But in the cases where there is customer impact, other stakeholders like sales and customer service need to be informed and updated as well.
Incident response is a technical response to an unforeseen problem occurring within a system. Subject matter experts (SMEs) are alerted to a problem and jump into the fray to diagnose whatever’s wrong, fix it, and return the system to normal.
In today’s highly digital world, however, incidents occurring in customer-facing applications have a wider impact than just the technical teams who respond to the incidents. Customers are, of course, wondering why their tool, app, portal, game, or shop is no longer responding. And internally, leadership and other line of business stakeholders are looking for answers.
While this “executive swoop and poop” might be seen as annoying or overbearing for responders who have to stop what they’re doing to give status updates, the communication cannot be overlooked. Communication is a key part of the incident response process, especially with teams adopting hybrid/remote modes of work. And there’s a way to do it so that both sides of the house, technical and non-technical, feel supported and empowered to see the incident through.
Business incident response is a non-technical response framework for mitigating business impact from severe technical incidents. Business incident response consists of two primary functions: transitioning normal business operations into emergency business operations mode, and managing proactive communications with both external customers and internal stakeholders. For this blog post, we’ll focus on how to manage communications to improve both customer and team experience.
Who needs to know?
During an incident, you’ll have several different audiences who need varying levels of detail about what’s going on. It’s unlikely that customers will need a deep technical understanding of what’s happening behind the scenes. It’s also unlikely that executives will be satisfied with a short customer-oriented communication. It’s important to tailor your messages based on your audience.
While there’s no one-size-fits-all approach to creating communications, there are some people you should consider communicating with if the incident is high priority. Here are five audiences to keep in mind:
- Executives: Executives are the most common stakeholders who will want to be informed of high priority issues, though they may not be the most technical stakeholders. You’ll want to give them the bottom line up front. One of the most important things they’ll care about is customer impact. They also might care about SLA violations if applicable to the business. You should expect to update them regularly on progress and make sure that if you have any needs or resource requirements that you make those asks early and often so you have the support you need as well.
- Customer service teams: These teams are often on the front lines of incidents, yet can feel left in the dark about how the resolution process is going. Customer service teams want to communicate with customers about when service will be returned to normal. It’s key to ensure that these teams are one of the first to hear about new incident developments. This helps bridge communications between customers and engineering and fosters trust. This added touch can be the difference between churn and renewal, especially in key accounts.
- Other technical teams: Sometimes, an incident on a service you own affects other teams. They can’t actively do anything about it, but they still need to be aware. In high priority incidents, you might want to share with them the customer impact, any relevant progress updates, and ask for any assistance you may need from them – whether more hands on deck or even just a heightened awareness for anomalies in the system. For this update, you can go into deeper technical details if it will help the team understand the context better, since the audience is more likely to understand what’s happening from a system perspective.
- Other line of business teams: Teams like marketing, sales, legal, and finance may need to know about incidents that affect how they conduct business. Marketing may want to stall a campaign that would drive prospects to a broken website. Sales may want to postpone demos. Legal and finance might want to get ahead of SLA penalties. These teams probably won’t care so much about the technical side of things, and may not have any resources that would be helpful to you. But they still need to understand customer impact and receive regular progress updates.
- Customers: Customer-impacting incidents can erode trust. One way to maintain some of that trust is by openly communicating with customers about incidents that affect them. Depending on how your customer base may be split, it may make sense to do this in two categories. One category is your strategic customer group. The other category is the rest of your customer base. The strategic customers might want more frequent and more in-depth updates. The rest of your customer base might simply need an acknowledgement update sharing that you understand there’s an incident and are working to fix it, and an update that shares when service has returned to normal. This can vary by organization depending on your standard tone and brand connection.
Now you have a better idea of whom you may need to update. But what’s the best way to organize this internally? It can be stressful to coordinate this communication on the fly, especially when resolving the incident takes precedent.
Prepping internally for business incident response
In most major incidents, you have someone who fills the role of the communications lead. This person should understand how to send out status updates to stakeholders to ensure that the business is moving in the right direction to mitigate any additional impact.
During other incidents, this communications lead is a separate person from a business incident response lead. The communications lead deals with all technical communications between teams and engineering or operations leadership, while the business incident response lead communicates with all the non-technical stakeholders. These two people need to work closely together yet have distinct swimlanes so that communication stays clear and streamlined. In either case, communications should be prepared as best as possible ahead of time.
One possible way to do this is with templates. Templates give you enough structure to understand what you need to communicate for each audience, but are flexible enough to edit if circumstances change. How many templates you create depends on what your organization deems appropriate. Some organizations will have one template for a few key audiences. Others will have templates for each of those audiences based on priority or incident stage. How detailed you need to be will depend on what’s right for your internal stakeholders and customers.
These templates shouldn’t be created within a vacuum, either. Instead, you should consider vetting them with the intended internal stakeholders to ensure they meet their needs. Additionally, you may test them with your customers as incidents occur and alter the templates based on feedback. Lastly, you may have teams internally who want to approve these templates before you use them. Often, legal and the executive team will want to view these templates and sign off on them to ensure that communications go as smoothly as possible in the event of a large incident.
Building trust and transparency with stakeholders
Having a prepared business incident response plan in place can help organizations communicate key information better internally. Additionally, it can help preserve customer trust during a difficult situation by proactively sharing information and developments as the incident proceeds.
The above processes are good guidelines to begin with, but for larger or more mature organizations, you’ll need a tool that helps coordinate and simplify the process. PagerDuty helps teams across the business coordinate a streamlined response and ensure customer impact is minimized for every moment, especially when seconds matter.