What Is AIOps and Why Should I Care?

by Jerry Weltsch December 3, 2020 | 5 min read

Artificial intelligence for IT operations (AIOps) means a lot of different things to a lot of different people, so a definition of what it is and what it does is difficult to nail down. In an age where digital acceleration is priority zero, companies are evaluating cultural shifts towards new operating models like service ownership to unlock efficiency in a complex world of hybrid cloud environments, AIOps emerges as an attractive potential investment to solve central IT aches and pains. But what is it and what can it actually do for you?

451 Research Senior Analyst Nancy Gohring knows the difficulty of defining AIOps as well as anyone—she has been conducting a series of surveys with IT operations and developer professionals to understand how they view AIOps and how they may apply it. We asked Nancy to dive a bit deeper with some interviews with said professionals, and what she found was that responses were all over the place.

From her research on this topic, one thing she could declare is that AIOps can be broadly defined as any tool in the monitoring and incident response tool chain that uses artificial intelligence and/or machine learning (AI/ML).

With this definition in mind, Nancy offered some suggestions as to what to look for when evaluating AIOps tools and solutions.

Embrace Potential Benefits From AI/ML, but Don’t Be Distracted by AIOps Marketing

Look for solutions that make it easier to adopt AI/ML for alerting noise reduction, such as tools that:

  • Have pre-trained machine learning models that allow you get started within days instead of months
  • Can work with on-premise, cloud-based, and hybrid infrastructures
  • Standardize data formats from multiple sources to integrate a disparate set of monitoring tools
  • Use machine learning in addition to rules-based approaches to ensure useful results
Look to the Past

Evaluate tools and solutions that leverage data from past actions of responders to better inform future actions and responses. Additionally, look at solutions that enable auto-remediation to more rapidly resolve incidents.

It’s Not Just About Technology: Don’t Forget That People and Processes Are Key

Getting the right person to respond at the right time is becoming more difficult with increased complexity introduced by the use of microservices and DevOps practices, so having a solution that can alert the right person at the right time is critical.

Think Big to Make the Business Case

Reducing mean-time-to-acknowledge (MTTA) and mean-time-to-resolve (MTTR) incidents are great goals for an IT operations team, but what does that really mean for the business? When making the business case for an AIOps solution inclusive of incident response, be sure to address the improved business outcomes as well. Outcomes include things like downtime avoidance or downtime reduction that translates to improved customer experiences and revenue protection, in addition to increased productivity from developers and operators who can now spend less time on unplanned work.

Unifying Data and Processes Can Improve Incident Response

Centralizing alerting data from monitoring tools on a single platform allows distributed teams to better orchestrate an effective incident response and drive a more collaborative approach, resulting in improved staff morale and productivity.

Embrace Automation

Automation is not only about remediation—which may take time to adopt for many—but it is also useful for removing the toil of incident response by automating specific tasks in the incident response process. These tasks include alerting the right person at the right time, setting up a response team teleconference, accessing the right runbook, communicating status updates to business stakeholders, and generating incident postmortem reports.

PagerDuty agrees with Nancy’s conclusion in this paper that just buying the right set of AIOps tools is not a silver bullet. To make the most of what these tools have to offer, you need to make them part of a comprehensive strategy for addressing event management and incident response.

PagerDuty practices and believes that such a strategy should include an assessment of how your organization’s teams own and operate their services. It’s especially important when considering new technologies like AIOps to understand how it fits into your existing operating models. As businesses have increasingly moved to the cloud to capture better scale and agility, technical organizations have been evolving to support more and more applications and microservices in increasingly hybridized environments.

This uptick in complexity across technology also means changes to people and their corresponding processes. Teams are increasingly taking on a decentralized form, where lines of business often staff their own technology teams, each with their own culture, velocity, and toolchain. IT leaders looking to purchase AIOps solutions should ensure that they keep both centralized teams and decentralized teams (where developers individually own and maintain their code in production) in mind to ensure they will get the right return on investment with utilization.

Download this report from 451 Research to read more about the lessons you can learn from Nancy Gohring and how PagerDuty AIOps can help you and your organization make the transformation to DevOps and full-service ownership.