This blog was co-authored by myself and Simon Darken. Once a year, PagerDuty’s SREs get together for a three-day, in-person offsite. With the team spread...by Dave Bresci
December 5, 2018
Central to the idea of DevOps is collaboration between all of the teams involved in IT applications and infrastructure. Developers, Operations, Quality Assurance, Security, and more, all have stakes in the delivery of a product or service. So, what does collaboration mean when it comes to monitoring in the world of DevOps?
In the past, different teams involved in creating or maintaining an application would first finish their portion completely before passing it on to the next team. For example, the Development team would first write code for the entire app, or specific features in the app, before passing it to the Quality Assurance (QA) team. The QA team would then do their testing and analysis before sending it forward to Operations, and so on.
This is similar to an Olympic relay team where each person runs 100 meters before passing the baton on to the next runner. Now, think of a team using DevOps ideas as all of those runners running simultaneously, in four 100 meter sprints instead of one 400 meter sprint. Instead of waiting for one team to hand off work to the next, each is working concurrently on the application in their own area of focus. The DevOps runners have more opportunities to adjust strategy, and iteratively improve.
DevOps concepts are commonly referenced alongside other software development practices such as Agile. The ideas behind Agile and DevOps are similar — break down work into smaller increments, iteratively improve in each increment, eliminate toil where it makes sense, and share learnings across the organization.
Of course, too many opinions can lead to its own set of problems. Well architected applications tend to have multiple components and clearly defined areas of concern. This is where distributed architectures such as microservices often come into the picture. This is also why teams that adopt the DevOps methodology produce a lot more releases, as they can address issues more quickly.
Two minds are better than one and a dozen minds are even better. Transparency and end-to-end visibility are key factors within DevOps for good communication. End-to-end visibility means having everyone on the same page through the entire development process. When everyone is on the same page, the development process becomes a lot smoother, and there are fewer chances of having to redo things.
What’s important to remember among all the mayhem that goes into building an application is that the ultimate goal is to refine and enhance the end-user experience. Since most cloud-based applications need to react swiftly to changing user needs and market shifts, sometimes it is necessary to quickly release updates for these applications. Software developers using DevOps ideas are able to keep up with this requirement easier, as there are fewer bottlenecks in the cross-team workflow.
Apart from the fact that this makes for happy customers, it also makes for happy developers who are much less likely to lose sleep over any bugs, crashes or mishaps. With good communication as a key difference between the traditional waterfall style of development and DevOps methods, this also influences the needs of the tools.
The DevOps style of monitoring, then, is by no means easier than the traditional Waterfall method where teams often just needed a couple of tools for the entire process and a lot of humans to make up the difference. Having a single team work across a wide variety of areas (development, QA, operations, and so on) requires numerous tools that need to work well with each other. Finding a group of tools that are all compatible can be challenging, but the results are well worth it.
One way to consider these tools is to split them into four groups of software—Application Performance monitoring, Infrastructure monitoring, Log analysis, and last but not least, Incident Management.
Monitoring applications is a key factor in identifying issues (performance, regression, or otherwise) and fixing them quickly as part of a team’s iteration. Three of the more popular application performance monitoring (APM) tools are New Relic, Dynatrace, and AppDynamics. Apart from letting teams monitor and manage their software, they also allow for end-user monitoring, which is crucial to ensuring an application is delivering the best experience.
The other side of the coin to application performance monitoring is infrastructure monitoring. There are many, many tools for this, including SaaS solutions such as DataDog, LogicMonitor, and SignalFx, as well as hybrid solutions such as Zenoss. Though it’s a favorite of teams transitioning from legacy development processes, Nagios alone isn’t enough for your DevOps monitoring strategy.
Analyzing logs is a crucial part of the iterative improvement cycle and is a critical part of system troubleshooting, debugging, and security incident response. Popular log analysis tools include Splunk, LogEntries, and Loggly. They allow you to study user behavior based on log file analysis. They also let you collect a fast amount of data from various sources in one centralized log stream. This is a convenient way to maintain and analyze log files. Apart from debugging, Log analysis plays a key role in helping you comply with security policies, and regulations and is vital in the process of auditing and inspection.
DevOps really outshines the traditional Waterfall approach in the field of incident management. It is with the help of incident management platforms like PagerDuty that cloud-based applications can now encourage users to resolve most of their issues themselves. PagerDuty is unique in its capabilities as it not only allows you to integrate with just about any tool you can imagine, it allows you to customize your incident management workflows to suit different teams. This approach avoids bottlenecks, and helps achieve the ultimate goal of providing the best user experience.
Following DevOps practices makes applications more resilient as it makes it easier to uncover implementation issues, and discover deficiencies. With infrastructure monitoring tools, a sudden influx of users that would normally crash a system can easily be detected, with application monitoring tools engineers can dive into the specifics of how performance is being impacted, and any issues that have escaped through all the different levels of quality and monitoring often require closer inspection through log analysis tooling. All the while, these tools all tie into an incident management platform like PagerDuty, to quickly enable engineers to triage and collaborate on incidents
As far as enhancing the user experience goes, DevOps is light years ahead of the traditional waterfall method and those looking to keep up with the pace of technology will have to adopt at least some aspects of this approach. Users of applications that are built and monitored by teams that use DevOps methodologies are accustomed to having their problems solved quickly and efficiently. These same users will seldom tolerate long wait times to fix bugs and catastrophes like system crashes, and anyone who wants to make an impact on this market will have to step up their game or be left behind.