PagerDuty Blog

Borrow Expertise With Runbook Automation

Every team has their experts. Maybe you’re the expert for a segment of your team’s applications—the person who’s always called when there’s a problem or when something unexpected happens or when things just look “weird” and the solution isn’t simple. Maybe there’s two of you or even, if you’re lucky, three!

It’s natural to want to call in The Best when the unexpected happens, but The Best might be spending the evening celebrating their child’s birthday, or going to the movies, or busy just relaxing. When teams are responders for their applications and responsible for making things right when they go wrong, experts can feel overwhelmed and/or overworked. So we want to find another way to leverage the knowledge of our experts without needing them to be on call 24×7. Automation can help us with this.

What Is Runbook Automation?

Runbooks are repeatable and automated standard operating procedures for common tasks. A runbook is expert knowledge coalesced into an artifact in your environment that can be used over and over to accomplish tasks. Some of the tasks might be related to incident response to help shorten your mean time to recover (MTTR), while other tasks can be things your team needs (like requesting a new host in a cloud platform) when alerts aren’t alerting and alarms aren’t alarming.

Borrow Expertise From Your Experts

Have you ever wished you could clone someone on your team because they know all the important things? Well, we can’t clone them, but we can extract some of their knowledge and make it available for everyone to use. This is especially important when it comes to incident management and response.

What happens when your senior team members respond to an incident? Do they check certain log files? Look for status codes or error messages? Check the status of a service? The steps they take to figure out what is going on with the environment might have been learned over years of working with the applications, but you don’t want everyone on your team spending years learning all of those tricks. If your team is constantly re-learning what the senior people already know, they could get stuck in a rut and fail to improve.

Automation can help your team “borrow” expertise from your experts. More than documentation—which might be in some dark corner of your wiki gathering dust—automation components can be a part of the application as it’s produced. Think of it as the answer to “What would The Best do?” Maybe they’d start with a couple of commands and see what happens. Maybe they’d restart a service or clear a cache—the point is, automation like this allows your whole team to leverage all the collected experiences of every team member. No one has to be the expert on all of the applications; instead, they can borrow expertise from everyone else on the team through automation.

The Benefits of Automation

Automating common tasks saves time and can help reduce errors. One of the common ways to approach this type of automation is through runbooks. It can help additional members of your organization become more independent. For example, perhaps your team is responsible for provisioning new instances in a cloud environment. Depending on the platform and provider you use, the number of potential options is huge; some will be expensive, some might not meet your security requirements, and some are going to be close but not quite what should be used. To prevent potential mayhem in your environment, your team of experts can create a runbook that will provision approved instances in the right locations for your organization and use the approved options, every time.

You can combine runbooks with a platform like Rundeck in order to:

  • Provide more self-service to your teams and teams you work with
  • Give your experts more time to work on more challenging or complex issues
  • Enable self-sufficiency without compromising security posture by providing access and permissions to disparate teams

Learn More About Runbooks and Automation

Runbooks and automation are a huge part of managing modern technologies. If you’re new to these ideas, check out our new Automated Remediation Ops Guide. You can also learn more about automating runbooks from Rundeck or by checking out our latest webinar about PagerDuty’s integration with Rundeck.

Do you have an automation story to share? We’d love to hear it! Join our PagerDuty Community and let us know what your team has been automating.