Running Operations Is Hard. PagerDuty + Rundeck Are Here to Help

by Damon Edwards November 16, 2020 | 6 min read

Rundeck has now joined forces with PagerDuty. What pulled us together? Our shared vision for improving the work lives of those who run modern digital services.

As a co-founder of Rundeck, I’d like to provide my perspective on why Rundeck becoming part of the PagerDuty family is a perfect fit for our collective user communities.

Sharing a Common Vision

No matter if you are on a “you build it, you run it” DevOps team or part of a centralized Ops team—operations work has always been difficult. Today, it’s even more so.

Every second of uptime and every bit of performance counts. Application and infrastructure complexity is skyrocketing. The pace of change is relentlessly increasing. The stakes have never been higher. And where does all of this pressure fall? On those who must answer the call when things go wrong.

At Rundeck, we’ve always championed those who do operations work—the unsung heroes of today’s digital businesses.

The more we got to know the PagerDuty team during our pre-acquisition partnership, the more we realized that our cultures were aligned. We both exist to help the human operators of the digital services that keep our modern global economy running.

Incident response is how we make the most impact for our users today. We help reduce the cost, chaos, disruption, finger-pointing, and stress when digital services break, making a positive impact on engineers’ lives and improving the bottom line of their employers.

PagerDuty + Rundeck = Shorter Incidents and Fewer Escalations

The addition of Rundeck to the PagerDuty platform delivers an end-to-end solution that resolves incidents faster (including before the customer is impacted) while reducing the stress and disruption that traditionally plagues operations work.

Examples of what Rundeck adds to PagerDuty.

By adding Rundeck to PagerDuty, you get fewer incidents, and when they do occur, time to resolution is drastically reduced and with fewer escalations. This makes the business happier and relieves the stress and frustration of the teams that operate digital services.

What Is Rundeck?

Rundeck is runbook automation, which gives anyone in your organization self-service access to the operations capabilities that previously only your subject matter experts (SMEs) could perform. Through Rundeck, responders can safely and intelligently execute workflows that invoke your existing tools, scripts, API calls, and manual commands.

However, it’s important to note that Rundeck doesn’t replace your existing automation. Much like PagerDuty, Rundeck is designed to accept the reality that heterogeneous infrastructure and tooling are a fact of life in any sizable organization. Instead, Rundeck encourages the re-use of the automation skills you already have and only adding new ones as needed.

Rundeck’s access control features make it easy to safely delegate control of operations tasks to your colleagues—no matter if they are on your team or a different team (either inside or outside the traditional boundaries of Operations)—and captures an audit trail of all actions taken so you can remain compliant.

What kinds of improvements are possible with Rundeck? We routinely hear of companies cutting down their incident response times by 40-60% and reducing their escalations by 50% where they use Rundeck vs. where they don’t. Here is an excellent story of a company—told in their own words from a conference stage—where they saved 28 person-years in the first 12 months, and they hadn’t even rolled out Rundeck widely across the business yet!

How Is Rundeck Used With PagerDuty Today?

Even before Rundeck was acquired, we saw enterprises in our community putting Rundeck together with PagerDuty to shorten incidents and reduce escalations.

We observed a common problem pattern among companies where PagerDuty and Rundeck were both being used. These companies all had some automation for their operations procedures (and plenty of gaps featuring manual steps). In most cases, some specific up-to-date knowledge or expertise was required to invoke the right scripts, commands, and tools at the right time to accomplish the desired outcome. This means that, even in teams full of highly skilled engineers, bottlenecks would form around specific individuals.

During most incident response scenarios, this meant that the clock was ticking while initial responders attempted trial-and-error solutions and/or sought out the SME(s) who could help—making incidents longer and caused more disruptive interruptions than necessary.

Before PagerDuty + Rundeck.

Adding PagerDuty helped those companies speed the detection of incidents and mobilize the correct responders. But that is only part of the lifecycle of an incident, and the clock is still ticking. So how do you enable those responders to take action? That’s where Rundeck comes in.

Rundeck safely puts expert automated operations procedures into the hands of anyone responding to an incident. With Rundeck, initial responders can safely use the same diagnostic or repair procedures that previously only SMEs could perform, dramatically reducing the time to both diagnose and resolve an incident.

Sure, your initial responders won’t be able to do everything that your SMEs can perform. However, you would be surprised how much we see organizations getting done with Rundeck, including:

  • Diagnostics and health checks across the stack
  • Recovering from known problems
  • Performance analysis
  • Configuration reports
  • Restarts
  • Scaling
  • Rollbacks
  • And more

After PagerDuty + Rundeck.

Leveraging the alert management and incident response data within the PagerDuty platform, Rundeck is often used for automated diagnostics and auto-remediation. At the start of an incident, anyone can start the investigation or even take corrective action before the main responder even has the time to log in.

All the features of this integration are available today. With PagerDuty’s event automation, you can trigger auto-diagnostic or auto-remediation procedures to be executed by Rundeck. From PagerDuty custom actions, responders can trigger Rundeck to take diagnostic or repair actions. From within Rundeck workflows, users can update PagerDuty incident timelines, change incident priorities, escalate, run a response play, and a whole lot more.

What’s Next?

If you are a fan of Rundeck, you can continue to use Rundeck as you do today. The product line will be continuing within PagerDuty. Obviously, we hope you’ll also give PagerDuty a try if you aren’t already a user.

Stay tuned for lots of product innovation from our new combined teams, including new and improved integrations!