Claranet Partners With PagerDuty to Achieve Real-Time Operations

Size: 1,001 - 5000 Employees

Industry: Technology

Location: London, UK

Customer Since: 2016

Key Integrations:

AppDynamics
AWS EventBridge
Google Cloud
Grafana
JIRA
Microsoft Azure
New Relic
Prometheus
Service Now
Slack
Zabbix

Founded in 1996, Claranet is an IT Service Management company that provides network, hosting, and managed application services to organizations around the world. With customer experience as the centerpiece of its company mission, Claranet helps bridge the technology gap for its customers by delivering tooling, automation, and IT services so they can focus on innovation while continuing to work on in-house development and maintenance.

Andrew Rundle, a Principal Engineer at Claranet, is part of the Group Engineering team that oversees Claranet’s infrastructure and operations services, specifically around hosting within its own data centers and the public cloud. Responsibilities for his team range from deploying servers and containers to managing the application experience and DevOps processes for their customers. “Our team’s goal is to reduce our customers’ costs and help them build a more efficient operation while also introducing new technologies, products, and services,” explained Rundle.

A Growing Network Brings Growing Pains

Claranet went through a phase of rapid growth stemming from several business acquisitions and almost tripled its employee count over a few years. This growth led to the addition of several new IT teams to Claranet, as well as an influx of new customers, applications, and tools to support.

This internal and external growth, coupled with incorporating new operating models with existing IT processes, created some new challenges, including:

  • Responder burnout stemming from unbalanced on-call schedules and rotations
  • Maintaining SLAs with customers due to communications issues driven by the influx of new teams and technologies
  • Technology sprawl from adding new teams, tools, and services to the organization
  • Delays in acknowledging support calls, which negatively impacted MTTR and reporting capabilities
  • Inefficiencies due to monolithic monitoring systems, manual processes, and siloed workflows

Because of the growth in new customers, products, and services, Claranet’s Group Engineering team needed an end-to-end incident management platform to properly acknowledge, respond, and resolve incidents before they negatively impacted both internal and external customers. “Our teams were getting calls four or five times a night during off-hours for one product. This was causing response delays, fatigue, and frustration for our team. Some of our engineers were leaving because the existing model just wasn’t sustainable,” shared Rundle.

Automating the Manual Work

Before PagerDuty, Rundle’s teams were using local Network Operations Center (NOC) resources to field incoming alerts, which was a manual process that relied on multiple human interactions before an incident reached the designated responder. Some of these teams and regions had centralized NOCs, while other regions took a DevOps and SRE approach to engineering operations, leading to a HybridOps model within the company. As a result, teams found it difficult to break down silos and ensure a degree of standardization and technology adoption across their monitoring stack.

Resources were getting exhausted by the influx of calls and the local NOCs weren’t properly escalating alerts to the Group Engineering team as they came in, because they weren’t fully aware of the severity of the incidents that the alerts were associated with. “NOC teams would receive off-hour alerts and not notify our team until the following morning, which became problematic when more severe incidents within our services occurred,” shared Rundle. The reliance on manual processes and human interaction created a bottleneck in the response process and negatively impacted MTTR.

With PagerDuty Live Call Routing, Rundle’s team now has the ability to create a self-service model to ensure incidents coming in are automatically sent to the right resources at the right time to respond quickly and efficiently. PagerDuty Live Call Routing at Claranet is used in two distinct ways:

  • Internal: When incidents or events occur that monitoring systems don’t initially capture or in specific situations where teams are needed for a platform-specific incident, the right teams can be notified immediately to orchestrate a proper response.
  • External: Some customers have a direct line of communication connected to the Claranet on-call team so they can escalate major incidents straight to the right responders when necessary.

“We’ve essentially gotten to the point now where we don’t have to rely on that human interaction anymore because of Live Call Routing. And over time, other teams across the organization have continued to adopt it because of its self-service domain,” explained Rundle.

Benefits With PagerDuty

Claranet has deployed PagerDuty across several globally distributed teams within the organization, including the Network, Security, and Engineering teams. Rundle’s team uses PagerDuty’s integration with Slack to communicate quickly and seamlessly about the response and management of incidents as they happen, while also ensuring full visibility of an incident’s current status to stakeholders, like the executive team. “Before PagerDuty, we had to individually reach out to people to ask what was going on, but with the Slack integration, we see everyone’s alerts and we can actually analyze correlations across the platform,” he shared.

Additionally, PagerDuty has helped improve the data management and reporting of the incident management process to key stakeholders and leadership teams. “PagerDuty helps us from the data perspective because you can actually see the data, take it to management and say, ‘Look, this is worth investing time and money,’” explained Rundle.

With PagerDuty, Claranet’s regional teams have the autonomy to use the platform in a manner that best fits the existing processes of a particular team, and every regional team can leverage PagerDuty in their own original way. “PagerDuty is a simple, slick application that ultimately allows our teams to reduce their workload and really see the impact through the data we get from it,” shared Rundle.

Claranet has seen several other benefits with PagerDuty, including:

  • Improvements in MTTR from removing manual work and adding automation to the incident response process
  • Faster response and resolution to incoming alerts with PagerDuty Live Call Routing
  • Reduced operational costs and increased service availability due to new process efficiencies
  • Greater visibility for key stakeholders into on-call performance and incident resolution with analytics and data reporting
  • A central point of ingestion that aggregates all of their monitoring data through PagerDuty’s ecosystem of 350+ integrations

“Having PagerDuty as that central aggregation layer saves us time by not having to go and build monitoring system integration and cookie cut everything by service.”

Andrew Rundle, Principal Engineer, Claranet

Looking Into the Future

Claranet plans to continue expanding the use of PagerDuty across different teams throughout the global organization, including the Infrastructure, Public Cloud, and Security teams across the group. “We want to be far more proactive and leverage even more automation to predict what’s really going on and reduce as much noise as we can,” shared Rundle. His team is also looking at implementing PagerDuty Event Intelligence to further their understanding of an incident’s makeup and how they can improve their response process across the organization.

To learn more about how PagerDuty is helping global companies with digital operations management, try PagerDuty today.