PagerDuty Blog

Real-Time Operations Maturity: How Businesses Can Thrive in the Digital Era

It’s rare to find a business today that doesn’t rely on digital technologies and services. Retail is one example: Whether customers are buying online or in store, completing a transaction requires a website or point-of-sale system. The entire supply chain relies on IT services to deliver goods on time, to the right locations, and just like any company today, every department —from development and marketing, to HR and business services—has a critical tech stack.

This increasing reliance on technology as the engine of the business, coupled with user expectations of having always-available services, means technology disruptions or degradations have an immediate impact on customers. It also means that the ability to respond and resolve issues in real time is more important than ever. And while some organizations are set up to handle real-time work, most companies are not: they lack the technology to support real-time work, their processes are designed around queued work, and their employees called upon to do this work are not enabled with the necessary knowledge or empowered to perform it.

Incidents Are Inevitable

In the digital era, at any moment, thousands of customers may be pushing the proverbial “buy” button. This means that every moment is a potential “moment of truth”—the moment where you succeed or fail in the eyes of your customers. All of us have had poor experiences or failed transactions, and the ramifications are huge for the affected businesses, with the Rand Group reporting that an hour of downtime costs $1 – 5 million for a third of enterprises.

From research we conducted, we found that organizations on average experience 22 incidents per month (7 major incidents and 15 minor incidents). Major incidents on average take 5 hours to resolve, and at an average of 7 major incidents per month, this amounts to 35 hours of down or degraded services per month. While incident counts vary across organizations, what we see is that digital services do sometimes fail. How effectively an organization can both predict and avoid issues—in addition to mitigating, responding to, and handling issues when they do occur—can mean the difference between having happy customers versus no customers. This is where the ability to work in real time really matters.

But what does excellence in real-time work look like? How does an organization develop that muscle? What are the benchmarks that can be achieved?

The Real-Time Operations Maturity Model

At PagerDuty, we realized there was no way for organizations to measure their real-time operations efficiency—and so we decided to build a method. We constructed the industry’s first Real-Time Operations Maturity Model, based on nine years of working with PagerDuty customers and developing our own best practices. The model lays out what excellence in real-time operations looks like, includes the metrics and behaviors with which to measure maturity, and helps organizations assess how mature they currently are. But perhaps most importantly, it details what benefits you and your customers can expect to see as your organization’s digital maturity improves.

The Real-Time Operations Maturity Model has four different levels:

  1. Reactive organizations tend to discover most issues when customers report them, but don’t have processes around responding to these issues. First-line responders often don’t have the skills, knowledge, or authority to resolve most service issues.
  2. Responsive organizations surface more issues before they impact customers. First-line responders have the skills, and are beginning to acquire the authority, to prevent issues. However, they still lack the necessary information to do their jobs.
  3. Proactive organizations surface and resolve most issues before customers are aware and affected. Learnings from past issues are automatically documented and responders are empowered with the knowledge and authority to resolve current issues and prevent future issues.
  4. Preventative organizations surface and resolve almost all issues before they affect customers. This level is extremely difficult to achieve, and the hallmark of an organization operating at this level is a culture of continuous learning. Their responders are fully empowered with the knowledge and authority to resolve current issues and prevent future problems. These organizations heavily utilize automation throughout the real-time response process.

To help determine what level of real-time operations maturity organizations have achieved today, we engaged with IDG to conduct a survey of 600 IT leaders and practitioners in the U.S., U.K., and the Australia – New Zealand region. Respondents represent industries ranging from technology and finance to communications and manufacturing.

So what did we find? Most organizations still have a long way to go in order to achieve real-time operations maturity and fully realize the benefits.

What Do Mature Organizations Have in Common?

From the survey data, we found that mature organizations learn from past issues, which are automatically documented and made available, and improvements are quickly implemented. Response processes are well defined, coordinated, and leverage automation as much as possible to reduce manual work, enabling employees to spend more time on innovation.

Mature companies also complete postmortems for 77 percent of incidents and complete 78 percent of follow-up tasks, taking full advantage of the opportunity to learn from incidents—and taking steps to implement improvements that can help reduce the risk of the same incidents from recurring.

Real-Time Operations Maturity and the Correlation to Employee Attrition

Only a low number of survey respondents reported that they measure health at the team and organizational level, in addition to properly managing workloads. This is concerning as data has shown that being on call can have a major negative impact on on-call responders’ happiness, both at work and in their personal lives. Responders can be interrupted by calls waking them up at night and pulling them away from important family events—and what makes it even worse is when they find out that many of the alerts are unactionable anyway, either due to lack of information or false alarms.

As one of our other research reports, “The State of IT Work-Life Balance,” points out, the risk of burnout for on-call responders is very real, resulting in high turnover and the loss of highly skilled technical employees for organizations—which can mean even slower response times and more unhappy customers when incidents arise.

But there is good news: The survey results also revealed that organizations can reduce employee burnout and attrition by increasing automation and reducing the number of unactionable alerts. In fact, the data shows that more mature companies create roughly 40 percent fewer incidents for every alert, resolve 40 percent more issues with automation, and experience a 21 percent lower on-call responder employee attrition rate when compared to their less mature counterparts.

Learning and Improving: Not Just a Pipe Dream

Still not convinced about the business impact of effective of real-time operations? The IDG survey also found that, compared to their less mature peers, mature organizations:

  • Acknowledge incidents 7 minutes faster
  • Mobilize responders 11 minutes faster
  • Resolve incidents 2 hours faster
  • Average 14 hours fewer downtime each month

As the survey data shows, there is a real and large positive impact to your business, your customers, and your bottom line when you have the right technology, process, knowledge sharing, and culture in place for managing real-time work.

To learn more about the model and how you can implement it in your organization, attend our upcoming webinar, IBM & PagerDuty: Driving Real-Time Operations Excellence, at 10 a.m. PT on Tuesday, December 4, or check out our summary of survey results, developed in conjunction with IDG.

Ready to see how mature your organization is? Take an abridged version of the survey.