Monzo Bank Ltd. is a digital, mobile-only bank based in the United Kingdom and has over one million customers. Founded in 2015, Monzo is built from the ground up with a modern technology stack leveraging primarily open source software and microservices to run its bank operations. The iOS and Android mobile apps are the heart of the bank, with innovative features that provide more convenience for its customers while adhering to the strict standards and regulations that govern traditional bank security and privacy practices.
Because of customer expectations surrounding access and usage of the mobile banking app, managing digital operations at Monzo means ensuring that the app runs faultlessly regardless of platform. “Engineers have more pressure to take action in real time,” Christopher Evans, Platform Team Lead at Monzo, shared. “You can’t go wrong and have one second of downtime. You have even more pressure than a traditional bank.”
Customer Experience Beyond Banker’s Hours
Since it’s a digital bank, Monzo’s hours of operation are 24/7. “Customers are the heart of everything we do,” Evans said. “Our services must always be up and accessible to our customers.” To that end, Monzo does not have scheduled downtime, unlike many traditional banks. To protect the customer experience, Monzo has been using PagerDuty to accelerate response to customer-impacting issues.
“We put a huge amount of engineering effort into making sure we don’t have scheduled maintenance periods, but we’ve had downtime because of incidents, which are inevitable,” Evans explained. “We prioritize customer-impacting systems ahead of everything else. The team’s focus is providing a bank that has a world-class level of availability, and PagerDuty is the hub for mobilizing the right people in real time to fix issues in the fastest possible way. That’s where the value lies.”
Visibility a Huge Asset for Improving Application Performance
As Monzo has grown, its use of PagerDuty has evolved alongside it. “In the beginning, our usage of PagerDuty was minimal,” Evans shared. “The majority of our alerting was done by our on-call engineers watching Slack channels.” Since then, Monzo has re-architected its monitoring and alerting solutions for more visibility into infrastructure health and system performance. The company has moved to the open-source monitoring system Prometheus for everything, from infrastructure monitoring to application performance management (APM). With this shift, Monzo can now monitor the metrics coming out of its applications.
Monzo also configured its use of PagerDuty to gain more analytical insights by leveraging the metrics collected by the platform. Using PagerDuty’s integration with Prometheus, Monzo now sends all of its alerts to the PagerDuty platform so it can better track mean-time-to-action and mean-time-to-resolution. Additionally, because more alerts are flowing through PagerDuty, the on-call team has more context to immediately take action when issues arise.
Investing in Team Health
When Evans first joined the platform team, the on-call rotation consisted of only four engineers across the entire business, putting them at risk for burnout. “It was a really stressful experience,” Evans explained. “People wanted to drop out of the rotation.” To expand the pool of available resources and improve work-life balance, Evans used PagerDuty to implement an on-call “shadow program,” which consisted of a primary team who were paired with new people joining the rotation.
Though the primary on-call engineers are in charge of driving action and response, those shadowing them are subject to the same SLAs, such as being no more than 15 minutes away from a laptop and responding to all of the alerts. PagerDuty’s automated on-call management features enable Evans to create multiple schedules for the primary responders and the shadow team. “That was the gateway for training the new on-callers to hone their skills in a safe, non-stressful way,” said Evans.
Monzo now has eight primary on-call resources and an additional eight resources shadowing them, bringing the total team size to 16. “The general health of on-call engineers is better,” shared Evans. “Just before I joined, three or four engineers left the rotation because they were burnt out. But since we implemented the on-call shadow program, no one has left the rotation.”
Charging Ahead Into the Future
In its current phase of rapid growth, Monzo will continue scaling its on-call teams with PagerDuty so that individual teams are empowered to manage their own schedules. Evans also plans to leverage PagerDuty Analytics so he has insight into noisy services that require attention, as well as how individuals are doing on call. “We want to be able to monitor how heavily affected people are by being on call for any given week,” explained Evans. In addition, once Monzo has a significant amount of data flowing through PagerDuty, Evans plans to explore PagerDuty Event Intelligence and its machine learning capabilities to further improve Monzo’s real-time operations.
“We know that PagerDuty will always be up and that we can rely on it,” said Evans. “I’m instantly less stressed because I don’t have that feeling of isolation. From day one, PagerDuty has really helped.”
To learn more about what PagerDuty can do for your organization and sign up for a free trial, visit www.pagerduty.com.
“PagerDuty is the hub for mobilizing the right people in real time to fix issues in the fastest possible way. That’s where the value lies.”