PagerDuty Blog

Summit Day One: Delivering New Machine Learning Capabilities to Cut Costs and Outages

At PagerDuty, we continually innovate every month (check out our What’s New page for the latest updates). But while we ship product continuously, we also save a plethora of new and improved capabilities to share with our customers at PagerDuty Summit, our annual customer event.

In this post, we’ll explain the new capabilities that were unveiled today at Summit and how they give you greater visibility and intelligence to help you make better decisions in the moment, as well as help you continually improve your operational performance.

Why This Matters

In today’s keynote, PagerDuty’s CEO Jennifer Tejada recapped how digital business has evolved over the last 10 years—for example, in 2009 there were 100,000 apps in the Apple App Store (compared to over 2 million today) and the cloud barely existed. But as Jennifer pointed out, digital business is only delivering on its promise if it improves customer experience—and the more it grows, the more difficult it is for technology teams to keep everything running smoothly.

For instance, if customers are having a poor experience because of outages or slowdowns even when everything is “up” according to a dashboard, then that dashboard isn’t terribly valuable. It’s increasingly important to tie technical data to business outcomes both in real time (what we call moments of action), as well as afterwards (what we call moments of reflection).

How Do You Do This?

There are three things you need to successfully tie technical data to business outcomes:

  • Visibility over very complex systems of technology and people
  • Intelligence, or thoughtful use of automation and machine learning
  • A culture of real-time operations that permeates not only technical teams, but your entire organization

When things go wrong, visibility helps organizations quickly establish a clear understanding over not only what systems are impacted and where the problems may lie, but also who is working on the issues and what their roles are—even if some of them aren’t technical professionals. After all, a poor customer experience may start with a system outage but escalates because customers are left in the dark when communications teams aren’t properly involved in the outage response. Worse is if your company is one of the 51% of firms that routinely find out about outages from their customers first—a finding from PagerDuty’s 2019 State of Unplanned Work Report.

Intelligence means using cutting-edge innovations in automation and machine learning to reduce noise so response teams can quickly hone in on the impact and scope of issues. Rather than automating people out of jobs, thoughtful use of automation means making them more effective at their jobs; in other words, we use intelligence to bring useful insights to skilled staff and augment their effectiveness.

Finally, establishing a culture and mindset of real-time operations across an organization, as well as one of continuous improvement, means everyone in the company sees it as their job to serve the customer and provide them with the best experience around the clock. It’s no longer just IT’s responsibility to keep servers up—instead, everyone has a role to play.

Our Announcements

Today, we are announcing two significant enhancements to our Event Intelligence and Analytics products that reflect these themes.

When we debuted Event Intelligence last year, we also introduced Intelligent Alert Grouping, which uses machine learning to group alerts together so teams don’t receive multiple alerts coming from related issues. The next step is Intelligent Triage, a new capability that provides context into an incident; for example, whether it has happened before, how it was resolved, how widespread it is, what services are affected, who is working on it, and how they can be reached.

By immediately arming teams with this knowledge, PagerDuty helps organizations pull together the right people, with the right information, to solve problems faster, minimizing the cost of downtime and preventing poor customer experiences.

“IT organizations are being swamped by the ever-growing number of incidents and events, generated by multiple platforms, and are looking for better tools to help manage and filter this information, prioritize activities, and define response,” said James Governor, analyst and co-founder at RedMonk. “Infrastructure automation is key to running IT organizations effectively. PagerDuty is now designing tools for knowledge automation, allowing teams to respond to incidents as teams, rather than individuals. PagerDuty’s interactive team dashboards and intelligent alert resolution are designed to make teams more effective by reducing organizational cognitive overload and automating common responses and resolutions.”

Intelligent Triage also uses machine learning technology that’s trained on your data to further reduce noise and infer relationships between events. You can give positive or negative reinforcement to the model by interacting with the UI, like the thumbs-up/thumbs-down feedback mechanism shown here.

Intelligent Triage uses machine learning to suggest other related issues that might be occurring at the same time.

Intelligent Dashboards in PagerDuty Analytics brings an exploratory experience for your operational data, augmented by an ML-powered recommendation engine called Spotlight that enables leaders to take immediate action. Having seen our customers struggle with many analytics solutions that provide data visualizations but no clear actions to take, we felt it was critical to provide not only the data, but also concrete recommendations based on data science that digital leaders could follow. We also heard how important it is for leaders to know how their teams are doing in relation to one another and against industry measures, so we’ve included a benchmarking function as well.

Intelligent Dashboards provide prescriptive insights over your operational data, combined with recommendations for actions you can take right away.

Intelligent Dashboards build on what PagerDuty Analytics delivered earlier this year: modern metrics provided through operational review scorecards. Unlike scorecards, which are typically used for weekly, monthly, or quarterly business reviews, Intelligent Dashboards allow you to derive actionable insights from your data at any time.

Learn more

Intelligent Triage and Intelligent Dashboards are just a couple of the innovations we wanted to highlight today—other announcements can be found on our What’s New page. If you are at PagerDuty Summit this week, stop by our PagerDuty booth in the Expo Hall for a demo of these and other features. If you’re not able to join us at Summit, both of these features will be available for customer preview starting in October, so get in touch with your PagerDuty account team to schedule a demo.