PagerDuty Blog

IT Operations in the Age of Coronavirus

Coronavirus has been a shock to the system for many IT organizations that are traditionally accustomed to working together in person. When you’re in an office, you can often use informal methods of communication—like swinging by someone’s desk, calling them on their office extension, or even imparting critical information when you run into them in the company cafeteria. And when urgent incidents requiring a real-time response occur, you often have a live network operations center (NOC) you can call into, which is staffed 24/7 with personnel ready to respond to incidents, and to corral necessary people and dial the (few) people who are remote into a phone bridge.

Obviously, what was possible weeks ago is no longer possible now. The worldwide—and sudden—mandates from companies and health authorities to make work fully remote have upended all of these processes. What IT organizations need to do today is twofold: automate communication and incident response processes and automate IT tasks.

Automate Communication and Incident Response Processes

IT operations conducted in-person can often mean that operational processes are ad-hoc, with poorly defined chains of communication. In some sense, it’s why NOCs and their phone bridges or war rooms exist: It’s a way to physically assemble people to deal with emergent or unpredictable situations. Without a way to do this, it’s time to invest in establishing standard, predictable workflows that can handle any kind of urgent, real-time operational incident, no matter where your IT staff are located. This is especially critical if you’re in one of the verticals that’s being highly impacted by the current crisis, like online education or video collaboration services.

PagerDuty has over 10 years of experience in helping customers establish consistent, predictable incident response processes, and you can benefit from our knowledge by using resources like our Incident Response Guide.

Automate Daily IT Tasks and Remediate Alerts

Incident response processes generally require some action to be taken on systems or applications in order to resolve any particular incident. Again, when teams are physically co-located with one another, it’s easy for IT professionals to simply log into systems and perform manual activities such as typing commands and running scripts, and reporting the results of those activities by voice to the team members assembled in a war room or on a conference bridge.

Once teams are remote, this level of ad-hoc task execution will be difficult to perform safely. In some situations, such as with offshore managed service providers or highly secure environments, employees may not even be permitted to work remotely—so automation of IT tasks is even more critical to allow incidents to kick off auto-remediation actions, for example. It’s time to define standard automation recipes to achieve common tasks, reduce errors, and improve knowledge sharing in a world where IT professionals don’t sit next to each other.

PagerDuty and Ayehu: A Joint Solution for Incident Response and IT Task Automation

To help teams with automating communication and their incident response processes, PagerDuty has teamed up with Ayehu—a leading provider of automated IT incident remediation—to create a joint solution. Operational issues needing human attention that are detected by monitoring tools or reported by end users can be automatically routed to the right teams in PagerDuty, which can then use Ayehu’s IT automation workflows to troubleshoot or resolve the issue, all without leaving PagerDuty. In our new, 100% remote world, this is an ideal process that can help you virtualize your NOC, replacing manual communication and ad-hoc IT tasks with a process that can seamlessly do both.

You can combine PagerDuty’s six free licenses of PagerDuty Starter (use the code “COVID-19” when signing up) with Ayehu’s five free workflows package. You can connect the two using either custom incident actions from PagerDuty to initiate Ayehu workflows from a PagerDuty incident, or even incorporate those workflows within a PagerDuty automated response play.

To learn more about how PagerDuty and Ayehu are working together to help you rapidly re-engineer IT processes and improve communications between IT teams during major incidents, please click here.