Dutonian Story
Platform Engineering links incidents to runbooks for faster diagnostics and remediation
Learn how one team improved knowledge sharing and reduced troubleshooting time by automatically linking relevant runbooks to incidents through custom fields and orchestration rules.
- PagerDuty /
- Ops Guides /
- Using PD /
- Platform Engineering Team
The Challenge
How They Were Working
The Platform Engineering team relied on tribal knowledge and manual searches through internal wikis to find runbooks when responding to specific incidents.
Pain Points
Tribal Knowledge
On-call responders relied on tribal knowledge to know if a runbook existed for a particular alert.
Lack of context
On-call responders did not have enough context to know what to do as next steps when paged about an alert.
Manual Toil
On-call responders had to manually search through internal wiki pages to find the appropriate runbook.
Key Challenge
Accessing the right runbook for the right incident without relying on tribal knowledge and manual lookups that added time to response.
The Solution
What They Did
Create links to dedicated runbooks in internal wiki site
Create an incident custom field to host a link to a runbook (requires Admin access)
Create an orchestration rule that populates the incident custom field with a link to the runbook based on event conditions
The Results
How They're Working Now
With orchestration rules and custom fields, the team can now automatically attach specific runbooks for specific incidents for faster diagnostics and remedition.
Wins
Improved knowledge sharing
New team members onboard faster to the on-call rotation by accessing runbooks.
Contextualized response
On-call responders have immediate context on what the alert is about and how to diagnose and troubleshoot it.
Increased efficiency
On-call responders worked more efficiently with less manual toil.
Outcomes
Faster onboarding
New team members onboard faster to the on-call rotation by accessing runbooks directly from incidents.
Reduced time spent looking for runbooks
Eliminated manual searches through documentation to find relevant runbooks.
Reduced time spent troubleshooting issues
Immediate access to diagnostic procedures accelerated problem resolution.
Reduced time spent on incidents
Overall incident duration decreased with faster access to remediation steps.
Lessons Learned & Tips
- Identify the right alert conditions to link to the proper runbook
- Conduct weekly on-call reviews to identify repetitive incidents that are diagnosed or resolved with runbooks. Link those runbooks to incidents using orchestration rules and custom fields.
Ready to streamline your incident response with runbooks?
Start your free trial today and see the difference.
Start Free Trial