Loblaw Companies Limited is Canada’s largest retailer and leader in the food and pharmacy industries, with a mission of empowering customers to “Live Life Well®.” The company provides online and brick-and-mortar marketplaces with access to groceries, apparel, health and beauty products, and financial services.
Jaspal Sawhney, Senior Director of SRE at Loblaw Technology, manages a team of over 100 software developers and engineers, all of whom are expected to build tooling and capabilities for the development team, which allows them to fully own their services. With the COVID-19 pandemic driving more online and mobile commerce than ever before, reliability and customer convenience are at the forefront of Loblaw’s priorities.
Loblaw’s philosophy has always been focused on customer convenience. “Wouldn’t it be great if you could go to one place and pick up birthday gifts, party favors, and get the catering as well? That’s the vision of ultimate convenience,” explained Jaspal. “As a retailer, our competition is fierce. If we don’t compete at a high level, then we may not be around in 5 to 10 years.”
A few years ago, Loblaw was a very traditional IT shop, with siloed teams and its own data centers. This created problems around visibility and accountability when issues arose. “We basically had a lack of ownership because everything would just be getting thrown over the fence,” said Jaspal.
As Loblaw grew and systems became more complex, centralized incident management was simply not scalable. Minor issues would turn into major incidents, which caused an increase in mean time to resolve (MTTR). “We would sometimes have outages that lasted longer than we would’ve liked,” explained Jaspal. “Every minute you’re down is serious dollars you’re losing.”
Part of Loblaw’s digital transformation efforts involved transitioning to a full-service ownership model leveraging cloud technologies—and that would take time to succeed. The first step was changing the roles within the technical teams at Loblaw Digital so that everyone was a developer responsible for the code they developed. “If you are in Loblaw Digital, then you are fundamentally writing software. And code runs the lifecycle of everything we have in our organization,” he explained.
The next step was cloud adoption so that the entire team would have visibility into their code and could follow it through to production. “Moving to the cloud allowed for a lot more control through pipelines and gave teams more visibility, accountability, and auditing capabilities than they had before,” shared Jaspal. The teams could use a host of cloud tools to ideate, build, and test their own code, allowing for new solutions to be built in a fraction of the time it took before. Having full ownership of their code also enabled the team to better understand the business impact of their work, which created even more accountability.
While the move to the cloud-enabled teams to move faster, the risk of failure was also higher, and it was critical that teams could learn from failures without assigning blame. It required Loblaw Digital to change its culture and seek a system that would provide psychological safety.
The final step to this transformation was bringing in a platform that could support a full-service ownership model for every person on the Loblaw Digital technical teams.
Loblaw adopted PagerDuty as the final puzzle piece in their digital transformation journey and shift to a full-service ownership model. PagerDuty enables Loblaw Digital to identify issues and understand if the right team member is the first responder to an incident. Rather than a centralized team scrambling to identify the root cause, all customer-facing services are now broken into individual components with ownership by the teams that wrote the code. With this model, the teams can push code more frequently, use the monitoring tools they want, integrate them into PagerDuty, and set their own schedules and escalation policies for the services and applications they own.
Additionally, Loblaw uses PagerDuty to facilitate postmortems, now a staple of its incident management process. PagerDuty provides a record of incidents used by Loblaw to build a blameless retrospective culture that focuses on continuous improvement without pointing fingers. “Now, teams are able to dive into these retros and share details to learn from them,” explained Jaspal.
PagerDuty has helped Loblaw with:
“With PagerDuty, we have been able to embrace a full-service ownership model for developers, which has been adopted by all teams taking the SRE journey as they modernize their applications for the cloud,” shared Jaspal.
Loblaw was well on its digital transformation journey when the pandemic began, so it was able to manage the increase in online traffic with little disruption. “This agile, full-service ownership model was really validated when the pandemic hit,” explained Jaspal. “As an essential service, we needed to be able to pivot and build these new products within a matter of days.” In the first four weeks of lockdown, Loblaw Digital built new solutions for seniors, healthcare workers, and frontline workers, with a focus on making their shopping experience as convenient as possible.
Full-service ownership created the autonomy and accountability for Loblaw to make decisions quickly and build as fast as possible. While other retailers took some hits during the pandemic, Loblaw Digital has been resilient and was able to pivot quickly due to its embrace of a full-service ownership model.
Because of Loblaw Digital’s success, the broader Loblaw enterprise has taken notice. “The key is asking, ‘How do we take what we’ve done within Loblaw Digital after proving some success with it and replicate that across the entire enterprise?” shared Jaspal. With this in mind, Loblaw Digital’s site reliability engineering function has transitioned into an enterprise team, focusing on smoothly creating and scaling highly reliable software solutions for the entire organization. “When it comes to digital transformation, there really is no end state. It’s always going to be evolving, and it’s our job to continuously improve, learn, and reach digital maturity and then push the bar higher towards engineering excellence,” explained Jaspal.