PagerDuty Blog

What’s the ROI? How Operational Maturity Improves Customer and Team Satisfaction

Are we looking at the new normal now? In the last 18 months, organizations all over the world were compelled to undergo a rapid digital transformation and mature their operations to support services that were under unprecedented strain.

Digital transformation allows companies to embark on large-scale cloud migrations and adopt modern development methods like DevOps and Agile. Yet these changes, coupled with skyrocketing usage, often mean an increase in complexity and thus an increase in incidents for those who operate and own the affected services. And these incidents can negatively impact the end user at a time where customer expectations have never been higher.

One fact will always hold true: incidents are inevitable. There’s no magic wand to make them go away. In fact, they’re only expected to be more common as the complexity of the technology ecosystem increases. This means that teams need to put in place the right response strategies and invest in modern digital operations to keep pace with innovation and continue thriving.

As the world continues its recovery from the COVID-19 pandemic, business leaders need to rely on key learnings from the barriers of shelter-in-place restrictions to continue innovating. As we found in the last 18 months, customer experience, operational efficiency, and talent retention can make a huge difference in an organization’s success. It’s crucial for leaders to continue prioritizing their investment in culture, processes, and solutions to unlock agility at scale and operational maturity.

A report by IDC noted that, “The basic requirement of successful digital operations management is the aggregation, grouping, and correlation of digital signals from an ever-increasing array of sources to create operational awareness.” Companies are turning towards tooling solutions to satisfy this need. In particular, the IDC report noted that companies who adopted PagerDuty experienced:

Infographic showing that PagerDuty users had a 795% ROI, 77% reduced time to troubleshoot issues, and 2.1 months payback on investment.

Whether you’re looking to adopt PagerDuty or a similar solution, it’s important to consider the level of partnership you’ll receive when looking at digital operational management tools, as this investment will have long-lasting effects on your processes and culture.

Unlock ROI and Focus on Business Value

How much does an incident actually cost you? According to a recent report by PagerDuty, each incident requires an average of 1.2 responders and takes 126 minutes to resolve, which, at a cost of $50 USD per hour per responder (this is the average placeholder of $50 USD per hour as a round number based from a $100,000 USD salary), costs $126 USD per incident. This number does not include any lost revenue which, for some companies, can be hundreds of thousands of dollars per minute.

And this is with teams that have adopted modern incident response with PagerDuty. Teams with more traditional operations processes often need to pull in many more people than 1.2 responders to get answers to key questions during the resolution process.

When considering a solution to help you advance your operational maturity, it’s important to understand these costs and how to benchmark progress against them. When you’re evaluating the usefulness of a digital operations management platform, you can consider looking at metrics such as:

  • Number of inactionable alerts per month: Nobody enjoys being paged. Yet, it’s even more frustrating to be paged for an alert that’s irrelevant than one that needs to be acted upon. As the number of inactionable alerts per month decreases, alert fatigue will also improve.
  • Number of incidents per month: Fewer incidents means fewer interruptions for your teams. This doesn’t mean that less will break; instead, it means that fewer duplicate incidents will be kicked off for the same problem, or more incidents can be solved via automation without any human intervention.
  • Average mean time to acknowledge (MTTA): With service-based architecture and a culture of full-service ownership, teams avoid a “tragedy of the commons” and always know which services they’re accountable for.
  • Average mean time to resolve (MTTR): This is one of the most commonly observed signals of improvement. Automation and ML (machine learning) can help your team resolve faster and concentrate on the parts of incident response that are uniquely in the hands of humans.
  • Number of escalations per month: Getting the right information to the right subject matter expert immediately will result in fewer escalations. This helps increase autonomy within teams and fewer engineers pulled into an incident means less cost to the business.

Keep in mind that as you track improvements on these metrics, they help more than just the bottom line. Normalizing metrics-driven approaches and adopting modern incident response practices helps to build a foundation for cultural change that can help drive improvement to operational maturity.

Building Culture from Process

Behind the complex technological ecosystem are teams of humans who own and support these services. These humans rely on the right practices and processes to help them respond to incidents with as little stress and toil as possible. Manual and reactive organizations often suffer from a lack of processes, or their processes have yet to be updated to reflect the increasing complexity of their systems.

Operational maturity doesn’t improve overnight. The initiatives that are able to lift teams from manual, reactive processes towards a more proactive posture require investment in cultural change. There is no silver bullet to this: change is hard, but the payback is well worth the input in the form of improved organizational resilience and team morale.

Having the right tooling in place can help teams follow best practices more regularly. For instance, when adopting a full-service ownership model, it’s much easier to know who is on call for that service if the tool you use can automatically trace the service owners to the respective service.

Additionally, while incident response processes can be toilsome, adding automation can help speed resolution along and remove cognitive toil for the responders.

Postmortems reinforce a blameless culture and promote psychological safety. By giving teams the space to examine systemic failures and find ways to fix them moving forward, you provide them the breathing room to make mistakes and learn from them.

These processes build a better culture, help alleviate toil, and can prevent potential burnout. With the correct tooling in place to support them, teams can advance in operational maturity while making their jobs easier. And, the return on a cultural investment is always as rewarding (if not moreso), than any other investment you can make. Happier, healthier teams produce higher quality work, and are fresh enough to respond to incidents with their full cognitive ability.

The more operationally mature a company is, the more likely it is to have both the tooling and the processes in place to make this happen. We share why digital operations maturity is a critical step to becoming an innovative, resilient organization, and how to have discussions about ROI in our webinar, “Perspectives on Digital Operations: Unlocking ROI and Operational Maturity.” Learn from Senior Director of Solutions Marketing Lauren Wang and Business Value Consultant Mark Gabbard as they discuss framing out operational maturity, PagerDuty’s own methodology for conducting business value studies with our customers, as well as we’re hearing from our user community and customer base on value and operational maturity.

Register to watch this webinar on demand.