This is a guest post by Ilan Rabinovitch, Director of Product Management at Datadog. The convergence of rapid feature development, automation, continuous delivery, and the shifting...by Ilan Rabinovitch
August 24, 2017
Putting together a dream team takes more than just picking good players. Paying millions for a phenom pitcher just to put him in the outfield wouldn’t do you much good. An outfielder on the pitcher’s mound? Come on now. The same goes in the game of ops. Without knowing the vital stats of your on-call responders, it’s impossible to build a top-performing team. A good coach can only develop an allstar line-up by knowing each player’s stats.
Enter User Reporting, the latest addition to PagerDuty’s Advanced Analytics suite. User Reporting helps managers and teams understand how individual team members are responding to incidents. Now managers can see how many incidents each responder has received, acknowledged, reassigned, or missed. With this information, managers can work with their teams to make sure every team member is in the right position and that workload is spread properly across the team.
It’s tempting to think acknowledgement and escalation metrics measure a responder’s work ethic. Escalations, of course, could simply point to responders consistently ignoring critical incidents because they’re lazy. But that’s the exception, not the rule. There’s usually a deeper narrative at play. Factors such as workload and the severity of assigned incidents weigh heavily on the way users respond to critical incidents. If a critical incident goes unanswered before an on-call responder has a chance to acknowledge it, they may simply be fielding too many incoming plays to handle new inbound incidents, or their notification rules may not be configured correctly.
It’s a similar story for incidents that responders manually reassign to others, which typically have an underlying cause deeper than responder initiative as well. An on-call responder may recognize they’re too busy to address the issue in a timely fashion, for example, and pass the ball so they can stay heads-down on whatever they’re working on. It may also point to a user who isn’t well equipped to resolve an incident – a manual hand-off up the food chain to someone who knows how to resolve it. Oftentimes incidents are simply assigned to the wrong responder, where a manual escalation represents responsibly forwarding the issue to someone who can fix it. In all of these cases, a manual escalation is not indicative of a weak player – quite the opposite, in fact.
User Reporting gives you data to identify the 'what' instead of just the 'who' when it comes to bottlenecks in the response process. Paired with techniques aimed at getting at the 'why,' you get even closer to attaining the golden unicorn of efficient critical incident resolution. The blameless post-mortem, for example, is a great way to use the newfound visibility afforded by User Reporting to understand why things happen the way they do, while the personal response benchmarks from user reports give additional information to hone in on what 'better' means. Going further still, techniques like the Five Why’s can guide the discussion to finding the root-cause of issues so you can optimize your team’s incident response process. Whatever techniques you use in the end, quicker response times allow your team to spend more energy proactively drawing up plays to eliminate the root causes of problems instead of reactively swinging at every pitch that comes your way.
User Reporting is a part of PagerDuty’s Advanced Analytics suite, available on the Standard and Enterprise plans.
Read more here about how to get started.