Tyro Payments Automates Microservices Incident Management with PagerDuty

Size: 371 Employees

Industry: Fintech / Financial Services

Location: Sydney, Australia

Customer Since: 2013

Key Integrations:

NewRelic
SumoLogic
Nagios
PRTG
NodePing
Slack

Tyro Payments, Australia’s leading independent payments provider, processes over $10 billion in transactions annually for more than 19,000 small to mid-size businesses across the country. The company supports more than 200 point-of-sale integrations and promises transaction completion times of under two seconds. Keeping this promise and guaranteeing service uptime requires a robust monitoring and incident management solution, which is why Tyro has leveraged the PagerDuty platform since 2013. With PagerDuty, Tyro manages alerts and notifications for its microservices-based applications and infrastructure.

Challenges: Manual Incident Monitoring and Scheduling

Tyro’s application platform consists of over 100 microservices that support critical banking operations. A failure in any of them could trigger a major customer-impacting problem. “If anything fails, customers can no longer accept payments,” Groenescheij said. “It’s critical to us to ensure the platform is always up.”

Before adopting PagerDuty, Tyro’s operations team struggled to identify failures in a timely manner due to their heavy reliance on manual processes for managing incidents. Alerts were sent via email to on-call engineers, who had to check emails manually to stay ahead of important notifications. Alert escalations when the on-call engineer did not respond or could not handle an incident independently also required manual intervention. If an incident affected an application and required developer support, the operations team would need to manually call out to them as well.

All of these manual processes were time-consuming and potentially left Tyro’s customers at risk if the operations team could not resolve issues quickly.

Achieving Automation With PagerDuty

Once Tyro’s operations team adopted PagerDuty, the tedium and risk of manual incident management quickly became a thing of the past. “The key thing for us when we started using PagerDuty was the fact that we were able to schedule, automate, and escalate incident response immediately,” Groenescheij said.

In addition, PagerDuty has facilitated better communication between the operations team and other parts of the organization by streamlining visibility into infrastructure and applications. “[Previously], if one of our infrastructure monitoring systems noticed an issue, and at the same time an issue occurred with one of our applications, the application team wouldn’t know that there was an underlying infrastructure issue,” Groenescheij explained. By allowing the team to coordinate monitoring data, PagerDuty now gives them a consolidated understanding of what’s happening in their environment.

PagerDuty has also helped Tyro’s engineers to work more efficiently with less stress. Engineers now receive notifications automatically, eliminating worries about missing an important alert. “We’re now able to step back and trust that PagerDuty will wake us up when we need to,” Groenescheij said.

“When we started using PagerDuty… we were able to schedule, automate, and escalate incident responses immediately.”

– Ed Groenescheij, Team Lead, Tyro Payments

 Expanding Infrastructure Visibility Further With Operations Command Console

In the near future, Groenescheij and his team plan to take further advantage of additional PagerDuty capabilities. These include the Operations Command Console, which will help on-call engineers track associations between incidents in order to prevent cascading service failures, which can occur when an incident with one application or resource causes disruptions for others that depend on it. Operations Command Console will also provide a consolidated interface for viewing monitoring data from all of the alerting systems that PagerDuty integrates for Tyro.

In addition, Tyro expects to extend use of PagerDuty beyond its operations team to include developers as well. “We want to ensure that developers gain instant visibility into application issues as they happen rather than relying on the operations team to walk over and tell them about the issue,” Groenescheij said. By integrating developers centrally into the incident management process, Tyro will further automate its software delivery and management workflows. In turn, Tyro will be even better positioned to accept payments with confidence, knowing that the ITOps and developer teams are working together to respond to issues quickly using PagerDuty’s automated incident management features.