Hybrid Ops—an amalgamation of CentralOps and DevOps—is gathering steam in the enterprise world. Historically, large enterprise organizations had Operations Centers (aka, Service Operations Center, Technical Operations Center, or Network Operations Center—the SOC, TOC, and NOC, respectively), which acted as the central point of orchestration for major incident response.
This was a viable approach for many years, but the rise of the DevOps movement meant that the CentralOps mode of Operations began conflicting with how the new wave of developers, who had both development and operations skill sets, worked. Additionally, with more and more enterprise companies realizing that they needed to meld CentralOps and DevOps, it became increasingly important to have a mechanism in place to help facilitate the successful digital transformation.
The Importance of a Coordinated Strategy
As with any major transition, one of the challenges of migrating to HybridOps is to gauge the impact the transition is having on the most important asset in IT Operations: the people.
Up until recently, telemetry in IT Operations (ITOps) has always focused on the servers, applications, and services. What was missing was the ability to assess the impact of ITOps notifications on responders and their families.
For example, if a responder is woken up multiple times during the night for several nights in a row because of deployment of new services or updates to existing services, that responder will be at elevated risk for burnout or attrition. And considering that enterprise companies today only have 19 percent of the resources that they need to successfully implement digital transformation (such as moving to HybridOps), employee attrition is a significant problem.
Additionally, without a coordinated migration strategy, there’s a very real possibility of siloed operations. For example, DevOps teams that become frustrated with the centralized control of the NOC/SOC/TOC may choose to put AWS EC2 instances and a Datadog subscription on their company credit card and manage their infrastructure and monitoring on their own. The flaw with this approach, however, becomes apparent in a couple of ways:
- In an enterprise environment, problems with the infrastructure supported by that team become highly visible.
- In the event of a major incident, business stakeholders (such as the executive team) typically call the NOC for updates about remediation efforts. In a siloed organization as described above, the infrastructure is invisible to the NOC, and the NOC won’t be able to provide relevant updates—which may slow down the time to resolution.
Let’s contrast that with an organization that has successfully implemented HybridOps, thus eliminating the silos, and uses a rational system of distributing load between the NOC and DevOps teams. In this scenario, the NOC can route the alerts coming in to the appropriate DevOps teams, and the teams can handle their specific incident. The NOC can understand whether an incident is cascading across multiple teams and update stakeholders, while the DevOps teams can spend their energy focusing on remediating the incident. The result? A dramatically improved flow of alerts, incidents, and incident resolution.
Operations Health Management Service
Choosing and implementing any ITOps Operations model is fraught with complexity, particularly in an enterprise environment. Without harnessing tooling and telemetry from a vendor that understands the depth of the problem and has developed the required capabilities, the risk profile of digital transformation becomes much larger.
PagerDuty’s Operations Health Management Service is the first industry offering that provides telemetry about the health and well-being of people in ITOps. Business and technical leaders gain a profound understanding of their operations infrastructure and specific recommendations for improvement as seen through the lens of their people’s health. Using this service, enterprises that implement these recommendations can achieve true HybridOps—resulting in happier employees, higher retention rates, and measurably improved digital service delivery.