Dutonian Story

Platform Engineering links incidents to runbooks for faster diagnostics and remediation

Learn how one team improved knowledge sharing and reduced troubleshooting time by automatically linking relevant runbooks to incidents through custom fields and orchestration rules.

Phase 1

The Challenge

How They Were Working

The Platform Engineering team relied on tribal knowledge and manual searches through internal wikis to find runbooks when responding to specific incidents.

Before workflow diagram

Pain Points

Tribal Knowledge

On-call responders relied on tribal knowledge to know if a runbook existed for a particular alert.

Lack of context

On-call responders did not have enough context to know what to do as next steps when paged about an alert.

Manual Toil

On-call responders had to manually search through internal wiki pages to find the appropriate runbook.

Key Challenge

Accessing the right runbook for the right incident without relying on tribal knowledge and manual lookups that added time to response.

Phase 2

The Solution

What They Did

1

Create links to dedicated runbooks in internal wiki site

2

Create an incident custom field to host a link to a runbook (requires Admin access)

3

Create an orchestration rule that populates the incident custom field with a link to the runbook based on event conditions

Phase 3

The Results

How They're Working Now

After workflow diagram

With orchestration rules and custom fields, the team can now automatically attach specific runbooks for specific incidents for faster diagnostics and remedition.

Wins

Improved knowledge sharing

New team members onboard faster to the on-call rotation by accessing runbooks.

Contextualized response

On-call responders have immediate context on what the alert is about and how to diagnose and troubleshoot it.

Increased efficiency

On-call responders worked more efficiently with less manual toil.

Outcomes

Faster onboarding

New team members onboard faster to the on-call rotation by accessing runbooks directly from incidents.

Reduced time spent looking for runbooks

Eliminated manual searches through documentation to find relevant runbooks.

Reduced time spent troubleshooting issues

Immediate access to diagnostic procedures accelerated problem resolution.

Reduced time spent on incidents

Overall incident duration decreased with faster access to remediation steps.

Lessons Learned & Tips

  • Identify the right alert conditions to link to the proper runbook
  • Conduct weekly on-call reviews to identify repetitive incidents that are diagnosed or resolved with runbooks. Link those runbooks to incidents using orchestration rules and custom fields.

Ready to streamline your incident response with runbooks?

Start your free trial today and see the difference.

Start Free Trial