(This blog post is inspired by the talk that I will be giving at DevOps Talks Conference Melbourne and DevOps Talks Conference Auckland. Hope to...by Matt Stratton
March 4, 2019
The fear of failure can be a massive hurdle for many development and ops team members. This fear can be so overbearing that morale across the board drops significantly, hurting employee productivity and advancement.
Having appropriate incident management and monitoring in place can, therefore, do more than just manage alerts and provide analytics; it can also relieve the burden of failure and empower DevOps teams as they transform the way apps are built and shipped. Let’s look at a few ways in which having an effective incident management solution boosts the morale of dev teams and enables them to do their best work.
Stress relief begins by having the right data. If you don’t have the right data, you don’t know where to look or what to resolve. Incident management ensures that you are centralizing all the right alerts with the right context and helps you dig deeper by surfacing relevant monitoring tool data. This makes teams more effective when troubleshooting incidents and helps them resolve issues faster. By integrating data from all your tools into an incident management solution, teams have end-to-end visibility into who’s working on what. Moreover, they’re equipped with the data (such as runbooks, graphs, etc.) they need to remediate an issue quickly. Faster resolution always result in happier teams.
When you have a well-oiled incident management process in place, you know what causes repeat downtime and where your infrastructure or app is more vulnerable to failures; this brings confidence when adding new features. Your QA team, for example, is able to write tests for specific areas of the app that need more attention and they can even veto a new feature because they know from experience that it causes issues. For teams to give feedback on new features, they require a deep awareness of how the system functions and of recurring issues; and this understanding is only surfaced with an incident management platform. This predictability helps the teams “trust the system,” and know what to expect at every step. This peace of mind boosts employee morale and confidence.
Traditionally, development teams have assumed their job is done once they’ve written code for the app, and have “thrown it over the wall” to Ops. Similarly, Ops would assume it’s not their job to ensure quality of code submitted and they’d normally just deploy anything that comes their way from development. With incident management, development and Ops teams are aligned in a unified platform, creating a single source of truth with visibility and consistency across teams. They are aware which issues are caused by code, and which are due to infrastructure. This means uptime is measured not just by how many servers are up or how much of the app is running normally, but by the end user experience. This realistic view of uptime aligns team’s goals and processes with the broader business goal of delivering applications that exceed user expectations. Happy customers make teams happy and what better way to boost employee morale than by showing a team that their work is delighting end users?
Speed is essential when you want to build game-changing applications. Incident management enables clear communication between the development and ops (as well as help desk, support, business stakeholder, etc.) teams whenever an incident occurs. When communication bottlenecks are reduced, team members can spend far less time on troubleshooting issues that crop up repeatedly, and more time building and shipping the app to keep customers happy. The increase in time spent developing the app results in quality builds, and much faster time to market. As deployment becomes quicker and the team accomplishes more per day, they begin to feel empowered, engaged, and passionate about their work.
Incident management empowers team members to be more accountable during incidents and establish incident command to make important decisions themselves without having to rely on the higher-ups. The ability to own decisions may seem simple, but it goes a long way in making team members feel more effective and confident in their organization’s processes. There’s less red tape because they don’t waste time constantly seeking permission from a higher-ranking team member. Team members are all accountable for being on-call to ensure the customer experience is being supported around the clock. Allowing team members of every level to be involved in decision-making also frees up valuable time as well, as less time spent in frantic chaos or redundant work means more time on actual development.
An effective incident management solution will improve employee morale by reducing the fear of failure and helping your teams confidently manage the unexpected. The confidence earned lets team members innovate and standardize across the pipeline, and greatly boosts employee morale, productivity, and success.