PagerDuty Blog

AIOps and Automation: A Conversation Featuring Guest Speaker Carlos Casanova, Forrester Principal Analyst

At the beginning of 2023, I had a great conversation with Carlos Casanova, a Forrester Principal Analyst, in a recent webinar about how AIOps can help drive successful organizational change. According to our conversation, Carlos has divided the AIOps market into two camps: technology-centric (primarily APM/Observability players) and process-centric. PagerDuty is a process-centric solution leveraging multiple technologies.

With process-centric AIOps solutions, organizations gain additional context and insights into  their data. This reduces the time to act, helps improve data quality, enhances decision-making, improves routing and notification efficiency, and ultimately increases the value of services delivered by IT.

This ability to increase speed with greater context shrinks the time for critical incidents. An important thing to note is that the initial routing can be to a virtual operator. Meaning that automation could drive additional triage/debug information or potentially complete a fix before engaging a human responder.

Throughout our conversation, Carlos and I kept returning to the theme of creating better context for responders. When I asked him about what capabilities he sees as most important for solving core AIOps use cases, he said, Quickly identifying the correlation across disparate alerts drastically reduces the noise that individuals are dealing with. Providing all impacted individuals with this clean data signal is vital to improving operations. With this data, individuals can more easily and quickly garner insight into what is truly going on in the environment. They can then quickly determine the right actions to take, decide who needs to be involved for faster remediation, and reduce the amount of effort necessary, which frees up time for other events and alerts.

But teams often struggle with getting started. We agreed that the cost of waiting and planning probably isn’t worth the cost of starting and iterating. He added “The overall initiative may look daunting, but there are achievable quick wins. Waiting is not recommended. Start with small tactical efforts that roll up to your larger and longer-term strategic goals to show progress, demonstrate value, and build momentum.”

So speed is also a continuous theme: quickly getting context, rapidly responding with automation, and starting the process immediately to see these wins. But we also know that the pressure has continued to grow. 

Teams have been affected by the economic downturn and slowdown. When I asked him about how teams can increase efficiency and measure success, we spoke about automation being key to success.

Carlos responded, “Simple scenarios that occur often are great candidates for automating all or part of their remediation. Fully or even partially automating five or 10 simple scenarios instantly frees up large amounts of time for individuals to focus on the more complex scenarios that organizations might not feel comfortable automating.”

But we also have to recognize the forming, storming, and norming before we get to performing in projects. There will be changes to how we measure and think about success that we have to embrace. 

“AIOps can also empower IT to alleviate workloads to help their delivery teams ‘do more with less.’ It’s important to remember that these changes invalidate existing metrics. You must establish new baselines, since individuals will no longer be performing the simple and low-level actions. For example, a technician manually resolves 300 incidents per week. Thirty are simple and have easily automated remediations. The MTTR on these might drop by 90%. Elimination of the simple incidents, however, only allows the technician to take on 10 medium-complexity incidents in their place. That means the technician will handle 20 fewer incidents per week. The average MTTR for the technician will go up, and incidents will stay in their queue longer, with a higher ratio of medium- and high-complexity incidents,” Carlos said.

One of the most common questions I run into is how to get started. Traditionally, AIOps is viewed as a potentially years-long initiative. It can be daunting to begin the journey with so much uncertainty and change. PagerDuty has greatly simplified the process by crafting a one-click process for event correlation so teams can see value immediately but this isn’t the end of the journey to AIOps. 

Carlos shared his insights on getting started, as well as facing the reduction in available OpEx. “Budgets are always a challenge, but to a large extent, you can overcome that hurdle by demonstrating and clearly articulating the value of AIOps. Develop a narrative for your business case that speaks to the value of improved experiences with the organization. Demonstrate how improved routing and notifications with enhanced contextually relevant data enables the same workforce to handle more workloads with less effort. Explain how patterns and trends empower lower-level resources to execute more advanced actions because they are provided suggestive actions that are based on the more experienced and senior staff members. All of this helps organizations deal with the economic challenges they’re currently facing while also improving the quality of products and services they deliver. It’s important for organizations to demonstrate their chosen solution has a fast time to value. For example, to improve user experiences, how quickly can the solution provide complete visualizations of transactions to support personnel to resolve an outage? To provide a faster response time, how quickly can the solution analyze the environment and correlate new alerts into singular incidents that can be handled immediately or in an automated fashion? Time to value is vital in difficult economic times.”

Time to value can be even more important than ROI for many of our customers. Speed is what will delineate winners and losers in digital battlegrounds. How quickly we can deal with inevitable issues and iterate improvements is what sets teams apart from competitors and provides an excellent customer experience.

As I&O leaders work through economic uncertainty that’s forcing them to cut costs and do more with less, they require new tools and approaches that help them scale and optimize their existing resources. AIOps provides teams with a reliable way to process high volumes of data and events, manage routing and response in real-time, and help teams resolve incidents faster. If you’re interested in learning how to tackle those challenges for your business, watch this webinar to hear the rest of my conversation with Carlos.