What Is Digital Operational Maturity? Digital operational maturity is defined as an organization’s effectiveness at real-time work and ability to focus on performance metrics that...by Aditya Patil
January 3, 2019
Failure is not an option — that’s what we’d like to think, but we all know the truth. The question of failure is not if it’ll happen, but when. Large, complex systems are more prone to failure than others, as their infrastructures often have years of technical debt from intricate architectures often pieced together as a result of mergers and acquisitions. This, coupled with trying to keep up with the fast-paced evolution of digital demand across the business, and failure becomes a cause for concern. The airline industry knows this well.
A scan of recent news headlines indicates, it’s no easy task. From Southwest to Delta to the most recent British Airways system outage, we are starting to see a tipping point in an industry desperately trying to keep pace with digital innovation. We’ve seen a major airline brought to its knees by a power issue that cascaded through its systems, resulting in thousands of flights being canceled. With increasing demands for a digital-first customer experience, airline IT systems have become major liabilities. Decades of business mergers and advances in technology have lead to a patchwork of inconsistent and unreliable systems. In the digital and connected age, downtime is more than an inconvenience — it spells millions of dollars in lost revenue and shaken consumer confidence.
Airlines have come a long way from the days when you would walk up to the counter or call a travel agent to purchase a ticket. Complex automated internal and customer facing systems and experiences all contribute to optimizing revenue by ensuring flights are full, running on time, equipment usage is being maximized safely, and every salted peanut is accounted for. All of this digital complexity came with a price. Airlines didn’t have the luxury of building the industry with a digital-first mindset. They didn’t get to sit around a table and discuss the mobile versus online experiences of their customers in relation to scheduling algorithms before planes were in the air. Like a lot of other industries that have been around for many years, they had to adapt, build, refine and patch over decades of changes in technology, passenger expectations and business practices without disrupting service. This is an enterprise-level house of cards that we have recently seen struggling in the news.
IT systems fail, they just do and sometimes there is no way around it. The DevOps culture has embraced failure and as a result have built digital companies, products, and services with the ability to innovate and react quickly in the event of downtime. Modern operations requires a sophisticated incident management processes that hopes for the best but prepares for the worst. Incident management has to be a top priority and receive significant investment. Every second of downtime in today’s digital-first world directly correlates to lost revenue. Southwest estimates their outage cost $54 Million and Delta Airlines estimates a $100 Million price tag for their outage. Looking at those numbers, it makes sense for modern operations teams to invest in the right people, processes, and tools to ensure that when critical incidents do occur, they are resolved as quickly as possible.
Catching up to modern operations doesn’t happen overnight. The airline industry has come a long way in a relatively short period of time, but it has a long way to go towards meeting the demands of a digital-first society.
To learn to adapt and evolve with the changing times, it’s crucial IT operations be up-to-date with best practices around what to do when an outage or disruption in service occurs, and how to react efficiently and reliably to restore service in the shortest time possible. In this day and age, systems being down or services being disrupted for any period of time is unacceptable. To help prevent extended periods of downtime or outages, it’s crucial to enable your team to communicate better in a crisis, monitor the IT stack more carefully, and implement a modern operations solution for incident management.
Despite your best efforts to prevent outages, systems can sometimes still go down. Learn best practices for communication in the event of an outage and what types of monitoring practices are critical to establish in order to efficiently respond to events.