Sky Betting and Gaming (SBG) is the leading mobile and online betting and gaming operator in the UK. Its platform processes over 44 million game transactions every week and posts over 50 million content updates every day. “Making sure customers get the best service is the ultimate goal for us,” said Rachel Watson, Head of Service Operations at SBG. With fierce competition targeting the same users, SBG must provide an outstanding customer experience by ensuring that its platform is available 24/7.
Rapid Growth, Manual Processes Call for PagerDuty
Scalability and availability are mission critical to SBG, as the company supports over 2 million active users and continues to grow rapidly. “If a customer wants to place a bet, they want to do it now. They don’t want to do it an hour later. By then, they would have placed their bet with somebody else,” said Watson.
As SBG moved to more of a DevOps model, with engineering squads responsible for fixing the code they build, the manual incident management process did not scale. “As more squads joined the on-call rotation, we could no longer have the traditional handoffs through phone calls,” said Watson. “If we misdialed a single digit, we’d end up leaving a message for somebody who doesn’t even work for us.” Watson’s team was often unable to reach the right people in a timely manner, if at all. As a result, it would take them at least 30 minutes to mobilize the appropriate responders.
SBG implemented PagerDuty to mitigate business disruption and accelerate response by automating the on-call management process. “Since using Modern Incident Response, our MTTR has decreased by 86%. The team’s morale has improved considerably as well as people’s satisfaction in their roles since PagerDuty has removed almost all manual aspects of monitoring. We’ve managed to claim back a considerable amount of time which has been reinvested in new projects as well as learning and development,” shared Watson.
Reduced Noise Improves Visibility
In addition to being able to automatically mobilize teams, SBG can now provide incident context so teams can immediately initiate a response. “Previously, when we got a major incident alert, we didn’t know what the issue was,” Watson explained. “We had a sea of red all the time because nobody had visibility into which alerts were genuinely critical.”
With the PagerDuty Visibility console, the Service Operations team now has a central view of everything occurring within the IT environment, be it callouts, major incidents, or low-level alerts. As a result, the productivity and engagement of SBG squads have improved because they know the notifications received have real urgency behind them and are therefore empowered to immediately take action. “PagerDuty has helped us move away from an excess of false and redundant alarms; it allows us to focus purely on service impact and truly critical alerts.”
The Service Operations team can now also identify trends and engage the engineering squads to investigate further. “If we’re seeing continual alerts, we can take them to the relevant squads and ask to look into why we are getting so many alerts,” said Watson.
Integrations Power Tribe Autonomy
Driving this increased visibility is PagerDuty’s comprehensive technology ecosystem of 300+ integrations, which allows the Service Operations team and engineering tribes to connect PagerDuty to several different monitoring tools. “In general, each tribe uses Prometheus, New Relic, Grafana and Nagios,” said Watson. “As long as they are feeding into PagerDuty, each tribe has the autonomy to choose the tools they want to use, while we simultaneously unify the incident management process.”
SBG also recently went live with Jira and will be leveraging PagerDuty’s integration to automatically raise tickets within Jira when an incident occurs. “It was very separate before, where we would look at an alert, raise an incident, call out, and then raise a ticket in Jira. You can do that all now within one tool,” said Watson.
For SBG, the number of manual tasks has been reduced dramatically, improving operational efficiency while ensuring its platform stays available for its users. Said Watson, “PagerDuty gets the right people engaged at the right time, all in one push of a button.”
To learn more about what PagerDuty can do for your organization and sign up for a free trial, visit www.pagerduty.com.
“With PagerDuty, we get someone online in less than four minutes. Our average time to restore an incident now is under 30 minutes that we used to spend on manually contacting people.”