This is a guest post by Ilan Rabinovitch, Director of Product Management at Datadog. The convergence of rapid feature development, automation, continuous delivery, and the shifting...by Ilan Rabinovitch
August 24, 2017
It’s easy to feel underutilized as an engineer working in a NOC. Especially in a larger organizations you may find yourself silod into owning highly specific responsibilities.
At PagerDuty, we don’t believe that any engineer should sit around, wasting time, watching lines on graphs move up and down. You’re too smart to waste your talents. Instead, whenever you are on the clock you should be an integral part of your team, pushing the needle towards the future.
If you ever find yourself frustrated that you are sitting idly around waiting to make phone calls while your company’s servers catch on fire, we encourage you to actively to rethink your role and start breaking down the silos in your organization.
As an engineer in a NOC you have the unique ability to touch several areas in an organization and play a vital role in the exchange of information. You can help solve problems faster and grow your business. But this is nearly impossible to accomplish as you spend time dialing phones instead of utilizing your skills.
Human Brute Force, Meet Automation Accuracy
You are the first line of defense when an incidents occurs within your infrastructure, which is an incredibly noble role to play. As the first person to encounter an issue in your company’s infrastructure you can do one of two things; take action and coordinate your teams knowledge and efforts to fix the problem or simply make phone calls to find someone to research and resolve the incident.
Spending time flipping through lists of team members can be excruciating, especially when contact information isn’t up to date. With the right tools you can automate
the painful part of escalating incidents to your on-call teams. These basic tasks can easily be eliminated with automation to increase the value you provide.
Empower Yourself, Increase Everyone’s Productivity
By supplying vital information that will help resolve incidents you are ultimately empowering yourself and taking initiative that will surely be noticed. Not to mention, you will probably be exponentially happier after knowing you have positively impacted the success of your company.
Supplying the on-call engineer responsible for an incident with the correct runbook or having the ability to identify a network latency or load balancer issue between sites that wasn’t apparent from your monitoring tool’s report can greatly reduce an incidents mean time to repair (MTTR). This assistance will be invaluable and greatly appreciated by your team.
Filter, With a Human Touch
At PagerDuty our alert bundling and deduplication is one of our most valued feature sets to eliminate alert fatigue. But tools are only as smart as their users. As an engineer in a NOC you have the unique ability to apply a human touch to filter alerts for your company’s on-calls.
As the first line of defense to encounter these alerts you can have a unique single-view of your system to determine whether or not action is required. In response, you can adjust the thresholds that are required to triggered an event. You have the power to make automation tools, like PagerDuty, smarter without having to unnecessarily wake up anyone on your team.
By implementing this severity-based alerting for your on-calls, non-critical alerts that occur at 2:00 AM can wait. Your team will thank you for letting them handle the issue in the morning, instead of being woken up in the middle of the night. Also by prioritizing your time, you can help resolve incidents before anyone even notices to alleviate some of your teams’ headaches.