In 1991, Packy Hyland Jr. convinced a Wisconsin bank it could save printing costs by storing reports on optical disks. That early innovation became OnBase and set Hyland Software on its way to being a leading provider of data processing, storage, and management.
A universal enterprise information platform, OnBase centralizes business content in one secure location. It then delivers relevant information when and wherever it’s needed – increasing productivity, delivering excellent customer service, and reducing risk.
Serving over half of Fortune 100 companies among its thousands of global customers, it’s critical for Hyland’s infrastructure team to ensure uptime of these cloud-based technologies, solutions and services.
The infrastructure team struggled to get actionable information to the right responders. “Prior to PagerDuty, we had multiple monitoring solutions that would deliver alerts in various ways,” explained Brian Long, Observability Engineer. “We had difficulty getting the correct information to the correct team, or alerts were delivered in fixed formats that didn’t necessarily give pertinent information front and center.”
For example, when the team needed to be notified about various version retirements, alerts came in as a giant block of text with no formatting. The information wasn’t consumable and lacked details about which instance, the endpoint that was being retired, and what work needed to be done on it. Even experienced responders would need extra effort and time to dive in and understand the problem.
In addition, triage and cross-team escalations were inconsistent and at times ineffective, resulting in slow or clunky collaboration. “Many of the processes that worked during the normal workday schedule, such as reaching out to those teams through Slack, weren’t reliable if those teams were off hours, or if the response was handled by a 24/7 team that then needed to escalate to a non-24/7 team,” said Long.
Hyland needed to improve the user experience for engineers, as well as drive faster resolution.
The company turned to PagerDuty AIOps to help with enriching and normalizing event data so responders had better context during incident response. The feature Global Event Orchestration uses custom logic and rules nesting to enrich and control routing, or to trigger webhook actions based on event conditions.
Global Event Orchestration cuts down on manual work by connecting real-time event processing with intelligent automation. “Leveraging PagerDuty’s Global Event Orchestration has been critical to ensure that our event routing processes are efficient and scalable to optimize IT operations and spend. With Global Event Orchestration, our organization is able to detect the ‘resolved’ condition from our notifications to execute as a resolve and reduce the number of places these conditions need to be configured by at least a factor of three. This frees up our time to focus on innovation, not configuration,” Long said.
Global Event Orchestration helped Hyland address the issue of poorly formatted alerts like various version retirements. Based on the metadata, the alert is intelligently delivered to the correct service. By adding Transformations and defining Custom Variables, difficult machine terms and code are translated into helpful context for responders to effectively respond to the problem. “Using custom variables, we are able to write pieces of text that make the alert information more human and easier to understand,” Long explained. “Now we know what version retirement it is, what account it’s on, and the instance or machine that requires action. The alert responder can then quickly mobilize, identify any additional pieces of information that don’t get sent as part of the payload, and resolve the issue much faster.”
Hyland also leveraged PagerDuty to assemble and mobilize cross-functional teams, escalating to additional subject matter experts when assistance is needed and further speeding up resolution times. Using Response Plays, incident actions can be run at the push of a button, which escalate directly to the appropriate team based on the pre-configured escalation policies inside of PagerDuty. The name of each Response Play is actionable, so the user knows exactly what will happen by clicking it. “All actions are tracked on the incident so the person reaching out knows what is going on,” Long said.
PagerDuty has made a significant impact on Hyland’s infrastructure team, helping to ensure an always-on cloud environment for customers. The team has seen improvements that include:
“When we looked at our problems, we saw that we had alerts that potentially needed to go to different teams, the alerts were poorly formatted, and we had hurdles and issues reaching out to other teams,” Long said. “PagerDuty solved all of that for us.”
Watch Brian’s Summit ‘22 Session—Intelligent Delivery and SME Mobilization: Ensuring Effective Alert Distribution and Resolution.