How to Ace Your Services with PagerDuty
It’s finals week for the US Open, one of the most celebrated sports events in the world. Tennis is my favorite sport to watch as I’m fascinated by the strength, composure and endurance each player displays while standing by themselves on the court, sometimes during incredibly long matches – the current record is 11h05.
Tennis players are fully accountable for the outcome of their matches at every single stage. Their performance directly impacts whether they win or lose. If this sounds familiar, that’s because it is. Service Ownership follows the same approach: “you build it, you own it”. In the context of DevOps, you’re not working alone. But there are definitely lessons to learn from tennis when it comes to building healthy, resilient services.
The parallel started drawing itself when interviewing Leeor Engel, Director of Engineering for the Incident Response product line. Keep reading and find out his take on how to ace services and how the PagerDuty team used PagerDuty’s own Service Standards functionality to improve the overall maturity of their services.
What is Service Standards?
When pivoting to a Service Ownership model, organizations struggle with having a clear visibility of their multiple services and how to uniformize their configurations. Launched a year ago for all PagerDuty plans, Service Standards can guide teams to better configure their services, while helping managers and administrators to scale these standards across the organization.
With Service Standards, PagerDuty provides nine standards that each service should fulfill to have the depth and context required for that service to be considered well-configured, all of which are able to be toggled on and off.
PagerDuty’s Customer Zero: PagerDuty
After the launch of Service Standards, PagerDuty was its own customer zero. Leeor walks us through the motivation behind this effort: “You wanna get adoption and figure out what the gaps are, get feedback, figure out ways to improve [the product]. Then there was an organizational goal. We talk a lot about what makes a service well configured and what does good look like. So we did a big push to get PagerDuty to be customer zero for that feature. We basically got every team to review all their services. And we actually found that many services did not meet the standards.”
Services varied considerably in their standard compliance, but “under 50%” were fully compliant. Approximately four months later, the goal to reach 100% compliance was achieved. But it’s a constant work in progress to keep it that way: “It can be very difficult, depending on the type of service, to get 10 out of 10 [standards]. So our goal was to get 100% of services to be at least 80% compliant. We got there. But then there’s an ongoing effort to maintain that because new services are created all the time, and it’s easy to forget this. And so our continuous process is what catches those stragglers and gets them compliant.”
If you also want to ace your services, here are four lessons you can draw from tennis dynamics to get there:
You might have identified the need to standardize your services to play in the best practices court. But maybe your organization has dozens, even hundreds, of services and that feels overwhelming. Where and how should you start to avoid feeling overwhelmed?
Lesson #1: Start with the baseline
In tennis, the baseline is where each game begins. It’s where players serve and it’s the foundation for their positioning and strategy. Without a well developed baseline play, there’s no chance of winning. But it needs to be built gradually.
Similarly, standards work as a service’s baseline level of quality, consistency, and functionality. It’s not about achieving perfection from the outset but rather about having a structured foundation to build upon. Take it from Leeor: “You want to focus on systemic things and define any standard as a starting point. Don’t worry about it being perfect. Just get it in place and have a continuous monitoring regime. And that’s gonna move the needle the most, because that’s going to expose all these other problems you might have in your processes that you need to improve, whatever it might be. It’ll be sort of the gateway to exposing those things and then addressing them, continuously improving.
Lesson #2: Adapt to the surface
Every tennis player has their own style of play, but they must adapt to the surface they’re playing on, each enabling different dynamics. On grass, for example, rallies are usually shorter, as the ball bounces low and players need to get to it faster – playing the net successfully and mastering the volley is key to success.
In the context of services, recognizing each team’s unique circumstances is a crucial first step when determining which standards that team’s service should follow. As Leeor explains, “teams can have pretty different needs in terms of their services. Sometimes their integration set up is a little bit different. Sometimes they’re not monitoring things that are directly based on code deployments. For example, one of our Service Standards is having at least one change integration – we may have services that don’t. They may be triage services that have email integrations or things like that. Those services still provide value and they need a standard, but they need a slightly different one. There isn’t a one-size-fits-all that works for everyone.”
Win the game
The foundations are set: you have defined your service’s boundaries and standards according to the needs of the team that owns it. Now you need to ensure those standards are complied with. How?
Lesson #3: Avoid unforced errors
An unforced error happens when a player loses a point even though their ability to execute it was completely in their control, i.e. not forced by the opponent.
Teams are responsible for keeping their service standards in check, but in the fast-paced DevOps world that can be tough; services change or new ones might be created depending on business needs. Leeor highlights three essential steps to successfully maintain the balance of your service standards and avoid the unforced error trap:
- Monitor: With the new PagerDuty Service Standards API you can pull your service standards on a regular basis. This allows you to confirm if the standards are in line with the service needs, if they might need to change or if it makes sense to create exemptions.
- Report: Create a reporting regime where you define a regular cadence to assess the state of all the services. With PagerDuty Service Standards it’s easy to do so, as the service performance data can be exported out of PagerDuty by the admins and shared as needed to drive accountability and show progress. Admins also have the option to make standards publicly available for the rest of the organization to view.
- Educate and be educated: Leeor explains how talking directly and frequently with team owners can raise awareness and educate on the importance of complying with service standards: “For example, business services were not uniformly used across all teams and it’s actually pretty useful. Even just to have a parent business service for your area. Then you can leverage capabilities like the Service Graph or Business Impact features. A system where you can see all your services at a bird’s eye view.” It can also help surface different use cases: “Over time, we developed this process where we could have some exemptions. An example would be testing a service that isn’t in production yet, and it doesn’t yet have the escalation policy. So we set up an exemption process – which ideally was temporary – and we set up some exclusions around specific standards.”
Win the match
Lesson #4: Continuously improve
The beauty of tennis is the course of a match can change instantly. There is no time limit to a game or even a set and players aren’t only depending on variables they can control: there’s the opponent’s focus and physical condition, the weather, and even the audience. Are they cheering you on?
Tennis is a game of continuous improvement and the same happens with services. Well configured services help scale Service Ownership best practices which, in turn, drive the organization’s operational maturity level.
Here’s Leeor’s number one advice to get there: “The key thing is reporting. Of course you need to establish what your standard is and that may look a little different depending on the business. But really the critical thing is the continuous monitoring and reporting. Mistakes happen, things get missed, humans are humans, right? So you need some process that catches the things that fall through the cracks. Define a standard and continuously monitor it, like you would do with any other process. You’re trying to continuously improve. You need to monitor it.”
Start Acing Your Services
Put all these lessons in practice with the PagerDuty Operations Cloud, the essential platform to get your services in shape and manage all unplanned, time-sensitive, critical work across the enterprise. Learn more here and try our free 14-day trial.