In Australia, A$18 billion a year is spent on gambling, and William Hill Australia is one of the leading betting and gaming companies. As a digital-only business, the focus and investment in IT Operations is one of William Hill’s greatest differentiators—they know firsthand how important it is to deliver always-on amazing digital experiences.
To deliver on their promise of exceptional customer experience, in 2018, William Hill Australia is embracing a more agile approach, migrating new and existing services onto AWS and using PagerDuty to support their journey.
William Hill’s customers won’t hesitate to move their business to a competitor if applications are slow or down; they follow events globally, so services need to be 100 percent available, every hour of every day. But for high traffic, “Tier One” events like the Melbourne Cup, the stakes are even higher—downtime can cost A$100,000 per minute, and the operations team needs to keep an eye on over 700 critical servers that make up core services and applications. “We’ve got to be available 100 percent of the time,” said Alan Alderson, Head of Infrastructure and Operations for William Hill Australia. “You just need one little blip and one minute of outage, and you’ve blown your KPI out the window.”
Being immediately alerted to any issue is critical for William Hill Australia so problems can be found and fixed before customers are aware. For Alderson, it’s important that his team finds the issues before anyone else. “I don’t want the business telling me I’ve got an issue. I want my technology to alert me,” explained Alderson. “That way, the business has confidence in us—that we’re on it, we’re monitoring our systems properly, we know when we’ve got an issue, and we’re working to restore services as quickly as possible.”
PagerDuty Automates Incident Management and Increases Visibility
Since implementing PagerDuty, William Hill Australia has reduced its manual efforts around incident management and increased its confidence that the right messages are getting to the right people.
William Hill Australia has numerous monitoring tools, including Splunk, AppDynamics, AWS Cloudwatch, and CA Unified Infrastructure Management. Prior to PagerDuty, all alerts would be sent to the Service Desk, which would then route “critical” alerts to ServiceNow, where an on-call engineer would watch the queue and manually call out if issues were not resolved.
“With PagerDuty, we are no longer relying on watching those queues in order to identify incidents and respond in a timely fashion, ” said Alderson. Instead, PagerDuty correlates that data and immediately alerts engineers in the format they choose—via SMS, mobile app, phone call, or email.
“[With PagerDuty], I have confidence that when something goes wrong, the service desk is going to get a phone call. And if that phone call isn’t answered, I know it’s going to be escalated,” Alderson said. “I know the issue is going to get picked up at some point in the next few minutes, rather than having to hope that somebody is watching a screen or monitoring an email queue. Alerts are now being picked up within seconds rather than minutes.”
Future Plans: Continuing Cultural Change Through PagerDuty
The newfound confidence is critical as William Hill Australia implements its cloud migration strategy. “Since January, we’ve been migrating our product, our infrastructure, and applications into AWS from our on-premise data centers,” explained Alderson. By combining PagerDuty with AWS CloudWatch’s system-wide visibility, William Hill can once again define how the team receives alerts and creates incidents.
Alderson acknowledged that full implementation of PagerDuty will take time. “You can’t change culture overnight, but over time I want the rest of the business to see the value of PagerDuty,” he explained. “We’re becoming a more agile environment. We’re only starting out with our ‘build it, own it, fix it’ philosophy, but PagerDuty will help us mature in this space.”
“PagerDuty is going to be the cornerstone of my plan for next year,” stated Alderson.
“I have confidence that when something goes wrong, the service desk is going to get a phone call, and if that phone call isn’t answered, I know it’s going to be escalated.”