Why We Use On-Call Shadowing On-call shadowing is an essential practice at PagerDuty. For a new engineer, a shadowing period serves as a kinder, smoother...by Max Timchenko
March 26, 2019
In the United States, it’s almost that time of year again where we count our blessings and give thanks. For retail workers, it’s also that time of year where they prepare for the onslaught of eager shoppers who waited hours in line to run into stores to get their hands on doorbuster deals (sometimes knocking down the employees in the process).
And for IT responders, it’s that time of year where their holiday dinners could get interrupted by a series of alerts about the website or point-of-sale (POS) system going down. Or that the inventory tracking and shipping systems aren’t updating. Or that advertised promotion codes aren’t working as they should. You get the idea.
Get ready to say hello to Black Friday, everyone!
Black Friday is known as the day that officially kicks off the holiday shopping season, but no one really knows how this American tradition got its name. The most recent explanation is that it’s the time of year when retailers turn a profit—essentially going from “in the red” to “in the black.”
Today, the term is somewhat ironic as the shopping frenzy brings so much activity that retail companies are prone to experiencing extensive service outages—blackouts, aka downtime. In the past when legacy systems were king, downtime was “accepted” as a fact of life in the IT world.
However, with Cyber Monday becoming just as popular as Black Friday, it’s more important than ever that retailers ensure all systems are up and running because everything is interconnected, from their mobile sites and online orders and in-store pickups, to order stacking and inventory updates.
Make no mistake, brick-and-mortar stores are still very much relevant, but the line between in-person and digital sales is blurring together. For example, by the end of Black Friday in 2017, consumers spent roughly $5 billion solely through various online platforms.
Additionally, according to Deloitte, about 67 percent of consumers are planning to make holiday purchases via their mobile device this holiday season, compared to 59 percent last year. With such numbers at stake, it’s clear why retailers need to take steps to improve their digital operations to maintain an edge over competition.
In today’s Internet, speed isn’t everything. It’s the only thing. When it comes to the digital experience, consumer expectations are always rising: In fact, a study found that 53 percent of users will abandon a website if the loading time exceeds three seconds.
For example, if a customer spends 30 minutes browsing a website and adding to their online shopping cart only to find out they can’t check out because the website crashed or they receive an email later saying an item is unavailable because inventory count wasn’t updated, they’ll share their frustrations about your platform with their peers. (Okay, okay, that example is from a personal experience—I will purchase my scented loofah set elsewhere, thankyouverymuch.)
Now imagine if this happened to hundreds or thousands of users per minute—the potential loss of revenue could seriously hurt the business and negatively impact customer loyalty.
Ensuring a repeatable and consistent online buyer experience is vital to maintaining customer loyalty and brand reputation. This is where the behind-the-scenes IT teams come in.
When backend systems slow down or crash completely, IT responders need to resolve the issue as fast as possible before it widely affects customers in order to minimize the impact to the business, often at the expense of family and/or personal time. But manually managing incident alerts during the holiday season is like trying to stop the flow of a firehose with your hands—it’s just not practical.
A modern IT infrastructure is built around redundancy and can carry a complex tech stack that includes, for example, AWS instances and storage, physical data centers, and a combination of multiple SaaS systems. As an infrastructure increases in complexity, monitoring all aspects of said infrastructure using disparate toolsets can quickly become overwhelming.
During the holidays, this reality can be even more overbearing when site traffic can increase astronomically within minutes, even seconds, during a flash sale event. Many retailers already implement holiday freezes, where no code changes will be made unless there’s an emergency. Others also set up “war rooms,” staffed with support teams and developers who are on call around the clock so they can engage the right people to react quickly to head off bigger issues and minimize customer impact.
Consolidating alerts and events into a single point of ingestion will enable responders to intelligently differentiate signals from the noise using a mix of rules and machine learning, thus preventing alert fatigue by allowing teams to easily determine which alerts need attention. If properly implemented, your teams can even take time to spend with their families during the holidays!
As the holidays approach and consumers are bookmarking their tabs and filling their carts, IT teams should ensure they’re prepared to respond to incidents quickly and effectively by asking the following:
These are just some of the questions that need to be asked to help ensure uptime of your mission-critical systems and a little downtime for your IT teams to enjoy the holidays with their loved ones. Happy Holidays!