This is a guest post by Ilan Rabinovitch, Director of Product Management at Datadog. The convergence of rapid feature development, automation, continuous delivery, and the shifting...by Ilan Rabinovitch
August 24, 2017
Outages are typically expensive for your business, not to mention a headache for your team. Improving your processes and tools for responding to critical incidents can generate a lot of business value. However, you may be wondering how to prioritize projects and investments, or you may be asked by business stakeholders to justify the costs of new initiatives. Return on Investment (ROI) is one way to do that, but it can be difficult to determine. While the answer is unique to each company, we’ll share a few different approaches to calculating your incident management ROI.
First, let’s clarify what we mean by Return on Investment (ROI). It’s calculated using the formula (dollar value obtained) / (dollars spent). Most businesses can easily calculate the costs they incur. However, determining value can be tricky. Below, we’ll share three ways to determine how much business value is created by new processes or tools:
If your business critical systems are revenue-generating (i.e. the backend for an eCommerce site), you may be able to calculate the amount of revenue you lose for each minute your site is unable to process transactions. If you can prevent an outage a year, how much additional revenue might your business be able to generate? This calculation can be powerful for justifying the business value of better incident management.
For example, one of our customers loses hundreds of thousands of dollars for each minute their site is down. When they implemented PagerDuty, they were able to save nearly 10 minutes on their resolution time. If they experience one of these outages a year, they’re generating over $1M in revenue that they otherwise wouldn’t have captured. That’s a great measure of business value.
Another important aspect is customer loyalty and trust. If your uptime is important to your customers, you may be able to quantitatively estimate how valuable reliability is for your business. Chat with your business colleagues to understand reasons customers leave and see if outages and downtime may be among them. Also, how high is your customer satisfaction? Could the business add more customers, or would current customers invest more time and money into your business, if you had fewer infrastructure incidents? Marketing, Sales or Customer Support teams may have the answers here.
If you give customers service credits for diminished performance during a severe incident, avoiding that cost could be another way to back into an ROI calculation. How much do these service credits cost, and how many times does your business have to provide them? If you could prevent one of these severe events from happening every year, you can use those savings as the “value created” side of your ROI calculation.
Additionally, by streamlining your incident response processes, you may be able to automate previously manual tasks, saving your team time. If you can calculate the hourly cost of your team’s time (we suggest using the “fully-loaded cost” vs. just salary), and you can calculate the amount of time saved, you can come up with overall cost savings to the business. These cost savings are another way your improved incident management processes generate business value.
Finally, you may have to consider the age-old “build vs. buy” dilemma. In many cases, it can be less expensive to leverage 3rd party solutions rather than building solutions in-house. Building a system in-house to streamline your IT response and manage on call schedules is expensive, and it draws the time and energy of your team away from handling incidents themselves. What happens when your in-house system fails? Not only would you be missing alerts for your own customer-facing products, your team would be pulled away to optimize and maintain it. Your team is only as effective as its incident management system is reliable, and using a dedicated, 3rd party system saves both time and money, as well as lowers the overall stress of the team. According to a commissioned economic impact study conducted by Forrester Consulting, PagerDuty helped a customer avoid the estimated $218,400 annual expense of building an automatic notification system internally. 
Risk mitigation is like insurance. We pay a premium to insulate ourselves against large unanticipated expenses if something were to go wrong with our house, our car, or our health, even though we may never need the service we’re paying for. You probably pay for backup storage because the costs of losing data would be very high for your business – that’s risk mitigation.
What are some risks to your business that you’re reducing by preventing outages? Customer or business partner trust may be one, or perhaps you incur significant costs to recover data. If you’re able to reduce the risk that the team will miss a critical incident that leads to an outage, you are creating business value.
In summary, there are many different ways IT teams calculate the ROI of better incident management solutions, and it’s important to focus on the goals that are relevant and important for your business. In a commissioned study by Forrester Consulting, they show how one of our customers achieved a 448% ROI with PagerDuty.
Read the study to learn more.
 The Total Economic Impact™ Of PagerDuty, a commissioned study conducted by Forrester Consulting on behalf of PagerDuty, September 2014