PagerDuty Blog

How service ownership can help you grow your operational maturity

Digital operations management is about harnessing the power of data to act when it matters the most. It’s also about having the right processes and procedures to support teams when every second is critical. Maturing your digital operations takes time, iteration, and commitment. The change won’t happen overnight. But, if you put in the effort, you’ll reap outsized benefits. You’ll be able to learn from incidents and proactively improve your services over time.

One way to improve your digital operations maturity is to adopt service ownership. In this blog post, we’ll share what service ownership is, how to make the transition once your organization announces the pivot, and how your teams will grow in maturity along the way.

So, what is service ownership?

Service ownership means that people take responsibility for supporting the software they deliver at every stage of the software/service lifecycle. That level of ownership brings development teams much closer to their customers, the business, and the value being delivered.

Benefits of service ownership are varied, but here are some of the most important:

  • Your teams will know who is on call and when. This helps them feel more confident in on-call, and builds accountability for the services they build.
  • Service reliability improves. When a team focuses on a particular service, trends are easier to notice. Issues with reliability bubble up faster, and improvements can be prioritized.
  • Customers experience less service degradation and downtime. Happier customers means a more successful business. With service ownership, you can respond to incidents faster and can even resolve them before any significant customer impact.

Many organizations make this move to service ownership to innovate faster and gain a competitive advantage. The flexibility of service ownership allows you to pivot in new directions and adapt to change at a rapid pace. But this isn’t something that can be completed in isolation. Service ownership is part of a new cultural and operating model that must be adopted organization-wide to be successful. Let’s look at how to get started.

How can I adopt service ownership?

Like any worthwhile culture change, service ownership will not be an initiative you can complete within a single sprint. And you’ll need the whole organization to move in this direction for this initiative to succeed. For the purposes of this blog post, we’ll assume that your organization is ready to adopt service ownership, and your team is looking for the best way to make the change. To get started, there are a few things you can do.

  • Create a list of services. If you haven’t created a list of all the services in your system, work cross-functionally with other teams to understand all the moving pieces. While eventually you’ll want to include business services, you should take it step-by-step and focus on those owned by technology teams first. Once you have a list of services, it’s time to start on the “ownership” part.
  • Define the team that will own the service. Start by considering who is responsible for the service you are defining. A service should be wholly owned by the team that will be supporting it via an on-call rotation. If multiple teams share responsibility for a service, it’s better to split up that service into separate services (if possible). Some organizations call this “service mitosis”—splitting one cell into two separate cells, each looking very similar to the former whole. There are several methods for deciding how to separate services like, for example, splitting them up based on team size or volume of code they manage. You can read more about how we did that at PagerDuty.
  • Set up the on-call rotation for this service. Ensure that the people on the team share responsibility for ensuring availability of the service in production. Create on-call schedules that rotate individuals and back-up responders on a regular cadence, as well as policies that include escalation contacts.
  • Ensure the team is sized correctly. Services should be set up granularly enough so that the members of that team are able to quickly help identify the source of problems. This can apply to creating a service with a scope so large that the knowledge necessary to support it is beyond what’s contained within the team. But it also applies to scoping a team in a way that is too small. For example, if two microservices effectively behave as one, and fixing a problem on one means also fixing it on another, then it might make sense to combine them.
  • Start small. It’s important to roll this change out incrementally. That way, you can show success early and inspire other teams to adopt this mindset. This also gives teams time to learn from others before implementing service ownership themselves. Ideally, the change should roll out smoother with each team.

As your system grows and changes, make sure to adjust services, teams, and on-call rotations accordingly. This isn’t a set-it-and-forget it motion. Instead, you should expect to change as your business does. Bake time into quarterly planning to understand how your team is faring. If you’re feeling overwhelmed, bubble up the need for more support. Teams need to make sure this feedback is given to managers, and managers are responsible for escalating accordingly.

Don’t we need some documentation for this?

Each service needs documentation, no matter how small it is. Documentation helps everyone better understand what the service is and does, how it interacts with other services, and what to do when problems arise. With this in mind, these are the most important points to touch on when creating documentation.

Naming and describing: The best named services aren’t the ones cleverly named. When naming a service, try to think of the most simple and descriptive way to say what it does. This helps eliminate confusion down the line as you grow and scale. Make sure your description is equally informative. The description should answer questions like:

  • What is the intent of this service, component, this slice of functionality?
  • How does this thing deliver value?
  • What does it contribute to?
  • If this is part of a customer-facing feature, explain how this will impact customers and how it rolls up to the larger business component.

Determining dependencies: Services don’t operate in a vacuum. Our jobs would be much easier if an issue in one service was isolated and didn’t affect any other services. Yet, this is not the case as we move more towards microservices. You need to know which services yours depends on and what services depend on yours.

At this point, it’s extremely valuable to create a service graph that shows both the technical and business services and how they map to each other. Ideally, this would be a dynamic tool that would allow you to understand how failure in one part of the system affects the rest of the system as a whole.

Beyond mapping these dependencies, you should have communication plans for them. How will you alert dependent services when you experience an incident? How will you communicate technical problems to other line-of-business stakeholders? Laying out these plans ahead of time can help you think of incidents in terms of business response.

Runbooks: Runbooks are an important tool for teams. They’re like a cheat sheet for each service. Make sure you document how to complete common tasks and resolve common incidents. As you become more familiar with your service, you can even include automation into your runbooks. This automation can range from advanced auto-remediation sequences that can eliminate the need for human involvement for some incidents, to lightweight context gathering and script running.

Whatever stage your runbooks are at, it’s key to update these regularly. If you notice something is incorrect in a runbook during response, flag it and go back to it later. Runbooks only work if they’re reflective of the current state. Create time and space to keep these assets up to date.

And remember that runbooks aren’t a cure-all. You can’t plan for and map out resolution instructions for every incident. As your system grows, you’ll encounter novel incidents. A runbook is a tool, not a silver bullet.

How do I know what success looks like?

True success comes from the entire organization adopting service ownership. You’re never done with this initiative, as services and their needs and dependencies are constantly changing. However, you can use metrics to understand how your service is performing. And you can talk to your team and understand qualitatively how they feel about this change.

To understand service performance, you can look at a variety of tools. First, you can use analytics to understand how noisy it is, how often your team is paged, and when those interruptions occur. This can give you an understanding of how healthy your service is in the eyes of the team supporting it.

If you want to know how your service is performing in the eyes of your customers, there’s a tool for that as well. SLOs, or service level objectives, are an internal metric used to measure the reliability of a service. SLOs determine the amount of failure a service can experience before a customer is unhappy, and are created from SLIs (service level indicators).

If you’re within the acceptable level of failure (also known as the error budget), your service will be perceived by customers as reliable. If you are not meeting your SLO, it’s likely your customers are unhappy with your performance.

SLOs are great tools for putting metrics to reliability and demonstrating the value of service ownership. But they’re not the only way to measure success. You also need to speak to your teams to understand their feelings.

Open discussion with teams can help bolster confidence and increase psychological safety. This is extremely important as you will encounter failure along the way. You may not size your teams correctly at the beginning of your journey, and some services might be strapped for support. You may not have the right SLOs, and need to recalibrate. Whatever the challenge you encounter, you need to stay blameless.

These hurdles mean you’re learning and improving. If you can approach them with a positive attitude and listen to the service owners, you’ll improve the reliability of your services, your system as a whole, and the happiness of your teams.

What’s my next step?

Increasing your digital operations maturity is a long road, but one worth traveling. It’s beneficial for your team, the services you run, and your customers. Adopting a service ownership mindset isn’t the only way to make these improvements, but it is a key component.

If you’re looking to learn more about service ownership, you can read our Ops Guide or watch this on-demand webinar. If you want to learn more about planning for digital operations maturity, check out our eBook. And, if you’d like to see how PagerDuty can help you move the needle on initiatives like FSO and operational maturity, try us for free for 14 days.