The State of Digital Operations

Welcome to the first State of Digital Operations Report, an aggregated view of the volume of real-time work, its growth over time, and the increasing burden that it places on technical teams.

By sharing an aggregated view of what we’re seeing across the PagerDuty platform, we want to help digital and business leaders understand the effect of good operations practices on business impact, operational health, and human factors.

On this page you will find some of the key platform insights we’re highlighting in the report. We will continue to refresh the metrics on this page regularly and look forward to reporting on trends that we’re seeing.

Average Daily Metrics (as of August 2021)

39M

events

1.3M

alerts

618k

interruptions

58k

critical incidents

19% growth in critical incidents YoY from 2019-2020

Critical incidents are defined as those from high urgency services, not auto-resolved within 5 minutes, but were acknowledged within 4 hours, and resolved within 24 hours.

Chart 1: Incident volume

98% noise reduction overall using PagerDuty

Using a variety of noise reduction techniques —including machine learning— events get compressed down to about 1 million alerts, which translate into roughly the same number of actual incidents per day.

Chart 2: Noise reduction

The average incident costs $126 USD in engineering time

Each incident requires an average of 1.2 responders and takes 126 minutes to resolve, which, at a cost of $50 USD/hour/responder, costs every company $126 USD per incident. Cost will vary by region and geography, but this is just the tip of the iceberg. This number does not account for downstream impact to brand reputation, revenue, nor employee productivity and morale.

Chart 3: Cost

Working hours in 2020 were considerably less consistent than in 2019

Humans ultimately sit at the center of incident response, so staying cognizant of overwork that might be happening at organizations is critical for business and technical teams alike.

worked per day +2 hours 2019 2020 Burnout and Churn

Interruptions in the U.S. in 2020 compared to 2019

9% more

off-hour interruptions

7% more

holiday/weekend hour interruptions

5% more

business hour interruptions

3% fewer

sleep hour interruptions

Not managing burnout can result in attrition

Our data science team looked at the relationship between users leaving the platform and how often they were involved in off-hour incident resolution. We found a statistically significant correlation: the more frequently users are involved in fixing problems off hours, the more likely they are to quit.

Burnout and Churn

In the U.S., Overworked and Burned Out Responders Are Bearing the Burden of Non‑Working Hour Interruptions

Chart 6: Non-working Hour Interruptions for Burned Out Responders vs. Median Chart 7: Non-working Hour Interruptions for Overworked Responders vs. Median

Teams are getting better at incident response over time

Looking at accounts that have been using PagerDuty over five years, it’s clear that they are getting better at incident response as they continue using the platform with MTTA and MTTR both trending down over time.

Chart 8: MTTA by Account Age Chart 9: MTTR by Account Age