Welcome to the first State of Digital Operations Report, an aggregated view of the volume of real-time work, its growth over time, and the increasing burden that it places on technical teams.
By sharing an aggregated view of what we’re seeing across the PagerDuty platform, we want to help digital and business leaders understand the effect of good operations practices on business impact, operational health, and human factors.
On this page you will find some of the key platform insights we’re highlighting in the report. We will continue to refresh the metrics on this page regularly and look forward to reporting on trends that we’re seeing.
Average Daily Metrics (as of August 2021)
19% growth in critical incidents YoY from 2019-2020
Critical incidents are deﬁned as those from high urgency services, not auto-resolved within 5 minutes, but were acknowledged within 4 hours, and resolved within 24 hours.
98% noise reduction overall using PagerDuty
Using a variety of noise reduction techniques —including machine learning— events get compressed down to about 1 million alerts, which translate into roughly the same number of actual incidents per day.
The average incident costs $126 USD in engineering time
Each incident requires an average of 1.2 responders and takes 126 minutes to resolve, which, at a cost of $50 USD/hour/responder, costs every company $126 USD per incident. Cost will vary by region and geography, but this is just the tip of the iceberg. This number does not account for downstream impact to brand reputation, revenue, nor employee productivity and morale.
Working hours in 2020 were considerably less consistent than in 2019
Humans ultimately sit at the center of incident response, so staying cognizant of overwork that might be happening at organizations is critical for business and technical teams alike.
Interruptions in the U.S. in 2020 compared to 2019
holiday/weekend hour interruptions
business hour interruptions
sleep hour interruptions
Not managing burnout can result in attrition
Our data science team looked at the relationship between users leaving the platform and how often they were involved in off-hour incident resolution. We found a statistically significant correlation: the more frequently users are involved in fixing problems off hours, the more likely they are to quit.
In the U.S., Overworked and Burned Out Responders Are Bearing the Burden of Non‑Working Hour Interruptions
Teams are getting better at incident response over time
Looking at accounts that have been using PagerDuty over ﬁve years, it’s clear that they are getting better at incident response as they continue using the platform with MTTA and MTTR both trending down over time.