Continuous learning

How to learn from past incidents

Good vs better vs best practices for learning from past incidents in PagerDuty.

When Does This Matter/Problem Scenario

Every incident is an opportunity to learn and improve on systems, processes, or infrastructure.

Why You Should Care

Learning from past incidents helps teams identify root causes, strengthen processes, and prevent repeat failures, improving reliability over time. It also builds a culture of continuous improvement and shared knowledge, so everyone responds faster and smarter when the next incident occurs.

PagerDuty Practices

There are several approaches to learning from past incidents, ranging from simple documentation to comprehensive post-incident reviews and AI-assisted learning.

PagerDuty image

Description of Practices

Good

Add resolution notes to incidents to help future responders know what actions you took to resolve a similar incident in the past.

Better

Use PagerDuty Analytics (and/or PagerDuty’s Insights Agent) to assess incidents from the past week to feed weekly on-call handoff reviews. Use incident workflows to log post-incident action items in a ticketing system.

Best

For major incidents, conduct a full post-incident review and link post-incident action items to the review. For day-to-day incidents, teach the SRE Agent what you did to troubleshoot and resolve the incident so that it can share those learnings with the next on-call.

To quickly collaborate on incidents, many Engineering teams at PagerDuty send their incident notifications to a dedicated team Slack channel to immediately thread conversations about an incident.

To immediately engage responders on specific incidents, the PagerDuty security team auto-triggers incident workflows that bring responders to a single slack channel with relevant incident data to begin troubleshooting.