We’ve heard it time and again: Digital transformation is happening across all industries and business is booming. Decades-old companies are migrating to the cloud, deploying...by Joseph Mandros
March 22, 2019
The role of the software developer has been rapidly changing. As a developer, you already know that your involvement doesn’t end when you deploy a service to production. Now it extends into managing that service and being on-call for production issues whenever they occur.
That’s a lot of responsibility. The first time you went on-call you might have felt worried and uncertain. What if something breaks? What if I can’t fix it? Eventually, something did break and you did fix it. And that felt awesome!
On-call can be kinda scary, at least in the beginning. But it’s also rewarding. It’s rewarding not only because it feels great to really crush an issue, especially when it hasn’t even impacted your customers yet, but also because it makes you better at what you do. It means you own the customer experience and it helps you build empathy with your customer. It also helps you build empathy with yourself, or rather, with your future self who will someday end up squinting at a service health dashboard at 3:00 am and trying to figure out what the heck is going on.
You can’t get that first-hand experience with just training. And it’s through that first-hand experience that you learn how to build services that are more resilient, that scale better, that fail more gracefully, and that tell you what’s wrong when something breaks. And that is definitely rewarding.
We at PagerDuty have been building new capabilities and APIs to make on-call as painless as possible, so you can get the benefits of the on-call experience without the worry and uncertainty that sometimes comes with it.
The first things you’ll need to know are (a) when you’re on-call and (b) what services you’re on-call for. With On-Call Timeline it’s easier than ever. The visual timeline shows your on-call shifts with a detailed listing of the escalation policies and levels for each one. The convenient ‘now’ indicator makes it easy to see when your current shift ends or your next shift begins.
A key benefit of PagerDuty is that it brings together events from all your different monitoring systems and provides powerful tools to help you leverage that event data. With the new Events API v2 you can easily integrate your monitoring systems and normalize the event data into the PagerDuty Common Event Format (PD-CEF). This allows you to view the alert details in PagerDuty in a consistent format without worrying about the different names that monitoring tools use for the same fields.
The Operations Command Console helps you visualize your alert data. You can use it to quickly identify noisy services and then use suppression to avoid getting notified for non-actionable events. When you do get paged for something actionable, the Infrastructure Health Application lets you instantly visualize the ‘blast radius’ of the issue to see whether the problem is localized or widespread.
When you need the detailed view, Alert Search allows you to customize your view of the Alerts table and quickly get find the relevant information using normalized fields, which is critical in minimizing cognitive load. You can add the columns you want and remove those that you don’t. Then you can search, filter, and sort the columns to quickly find what you’re looking for.
With the enhanced Incident Creation API, you can automate the process of creating a new incident to represent the underlying problem you need to fix. Then you can use the Incidents Merge API to merge the relevant alerts which describe the symptoms of the problem into the incident that represents the problem as a whole. This enables you to focus your response around one incident in PagerDuty containing all the relevant alerts to provide context to the response team, and a unified record of the response to assist with later analysis and postmortem.
We also offer more than just resolution workflows for your monitoring alerts. Since issues can arise wherever you are, we now offer manual incident creation from the PagerDuty mobile app.
Maybe you aren’t on-call but you just noticed an issue with your checkout process. Or your service provider just told you there’s an urgent problem with your account. Create an incident with the mobile app and get the response underway immediately.
At PagerDuty, we know what it’s like to be on-call — we’re on-call too. That’s why we’re committed to providing the best solution available to support developers on-call. These new enhancements and capabilities are part of that commitment. They’ve already made our on-call experience better and now we’re excited to make them available to you.