Service-Based vs. Team-Based Approach: Which Is Better?
How is the incident response process set up at your organization?
At PagerDuty, our approach is to holistically look at your infrastructure, your customer-facing applications, and your products. We distinguish these by describing these items as “services” that roll up to and make up a “business service.” This setup allows teams to better manage these services so that when incidents do happen, responders can gain context much faster. But how?
First, let’s talk a bit about services. Services are built to last, and they typically outlive the teams that originally developed them. In other words, people come and go, and teams organize and reorganize all the time. And shifting team-to-service ownership doesn’t just happen once a year or just during reorgs. People take on new services, inherit old ones, and trade ownership even for just a few weeks during a specific project.
So if you orient your incident response platform to the teams and then to the services (or worse, no services at all), you’ll have to redo your whole incident response setup every time team reorgs happen. Plus, you lose institutional knowledge and important analytic data along the way as teams change. Sounds like a nightmare, right?
That’s why PagerDuty has built our platform to make it easy for organizations to orient their incident management processes around services, which allows teams to grow and change over time, and provides better visibility into services health and trends—all without impacting how those services are delivered, maintained, and improved, thus ultimately helping reduce extended downtime and negative customer impact.
Gain Better Visibility Into Business Impact, Services Health, and Trends Over Time
If you’re like most companies, you likely orient your incident process setup, production support, and configuration around teams; i.e., you take a team-based approach. This means you likely have a mix of ITSM, DevOps, and ITOps teams, with business/technical teams defining business services, and many other teams that own different services.
So how do you move from a team-based setup to a service-based setup? It’s easier than most think.
First, identify your top-level business services that are distinct parts of the products or applications your customer interacts with to perform specific tasks or outcomes. For example, “login,” “shopping cart,” and “search” are all business services. Then, for each business service, identify technical services that contribute to that business service. Each technical service should ideally be owned and developed by one team at at time, even if multiple teams contribute to maintaining it long term.
Once you’ve identified your business services and the corresponding technical services that support them, you can now do a lot of interesting things. For instance, teams can now see what’s happening in real time across the business to better understand if an issue is isolated or has a broader impact, allowing for better coordinated response when it does span multiple teams and services.
When events are routed to distinct, separate services that reflect the services in your environment, it’s easier for everyone to communicate about what’s going on. On top of that, responders get insight into other incidents going on across your entire infrastructure.
For example, let’s say you’re on the Site Reliability Engineering team and receive a notification about a service being down. But another responder on the database team also got the same notification. Because you can now look at associated incidents across multiple services, you can see that it’s a database issue, so you can stop working on the issue since you know the database team will take care of it.
Align With the Business and Business Needs
Most companies today still have a team-based setup for their incident management process. And while it initially is easier to set up, it actually becomes a challenge to manage in the long term as you grow and scale. Why?
The silos produced with this approach create confusion for responders. It’s more difficult to orchestrate an effective response when an incident happens because responders need to spend extra time digging into what’s actually affected—”Am I being paged about Service A or B? What level of response is required?”—before they figure out what to do. Remember: Most teams own at least 10 different services and organizing your event information so that alerts are routed to distinct services will help them better understand what’s going on.
In contrast, organizations taking a services-first approach to setup bridge together technical and business services, which makes an explicit impact on the business and customers because it provides context on the importance of a given service. It also provides a common language for communicating, helping organizations automate the sharing of concise and actionable status updates with those who need to know. (E.g., Service A supports our quote-to-cash system and requires an elevated response the event of an incident compared to Service B, which is a non-essential, internal service with no SLAs.)
But a Team-Based Setup Is SO Much Easier
As I mentioned earlier, yes, using a team-based approach when configuring and setting up your incident management process is initially easier; however, the negatives outweigh the positives in the long run. For instance, with a team-based approach, you wouldn’t be able to:
- Assess the business impact of incidents in real time
- Analyze the impact your services are having on the reliability or stability of your application
- Accurately assess the blast radius of issues, which is important since they typically span across multiple services
- Quickly determine which business stakeholders need to be notified during major incidents
Additionally, a team-based approach lacks the flexibility to make changes as teams change and reorganize—plus you’ll need to constantly rearchitect your teams and services whenever there’s an organizational change, which takes away from your ability to innovate.
So Which Approach Is Right for You?
Before you decide on whether you should take a team-based or services-based approach to setting up your incident management process, first ask “Why am I setting up on call to begin with?”
The most likely answer: You’re setting it up because you need a team or someone to respond quickly when an incident arises. And we’ve seen a lot of companies set up their configuration with a team-first approach because, well, you have a team, you want to make sure everyone is on the on-call rotation, and it’s fast and easy to set it up this way.
But if you take a step back, a more optimized approach is to think about the services your teams support because this allows you more flexibility with regard to changes—the teams can change over time, but the services they support can remain the same.
Don’t get me wrong—teams are very important. But the reason why we recommend that you orient your incident management configuration and setup around services first is that doing so provides you with the:
- Visibility needed to understand the health of services and improve processes
- Insights to see what’s trending to identify hotspots
- Ability to easily and quickly see which team supports what services vs. having to first go through multiple teams and understanding those layers before getting to the service
At the end of the day, what the business really cares about is being able to do business—and anything that impacts that needs to be quickly addressed. And the best way to do that is via a services-based approach because it gives you the ability to understand who needs to work on what and how to prioritize it instead of wasting time digging through layers of teams to get to the services.