As the hub that centralizes DevOps and IT Operations tool data, many organizations depend on PagerDuty to notify them whenever any component of their IT infrastructure is behaving unexpectedly. If you’ve used PagerDuty before, you’ve likely had to deal with multiple incidents related to a single problem, each of which you get notified for. This typically happens if you have redundant monitoring systems configured or if a single point of failure or degradation causes a domino effect of multiple tools simultaneously firing off alerts.
To address this, we’ve introduced some significant changes to our data model by redefining the concept of an alert within PagerDuty, as an object that tracks the monitoring tool state. The use of alerts within PagerDuty is foundational to two exciting new capabilities — Alert Triage and Suppression.
Introducing Alert Triage
With the availability of the new Alert Triage capability, you can group related alerts into a single incident object that enables true end-to-end incident management. Responders no longer get paged on individual, silo-ed symptoms. Instead, resolution workflows are now centered around an incident object that is truly representative of a real, service-impacting problem or outage. This capability redefines how customers can intelligently triage and interact with the data from their systems to reduce noise, improve cross-functional collaboration, and drive down resolution times.
Alerts will automatically be enabled on new PagerDuty services and you can begin using the new Alert Triage features immediately. For existing services where it makes sense to to have this configured, simply click on “Edit Service” and toggle on the option to “Create alerts & incidents”.
When a service is configured to Create alerts and incidents, all actionable alerts will create a parent incident. To consolidate related alerts into a single incident, select two or more incidents on the incident list, press Merge, and select the incident for everything to be merged into.
When you’re merging multiple incidents together, you can easily edit the incident summary to accurately reflect the issue in question, so responders can quickly get up to speed.
Alert Triage Benefits
There are many great benefits of Alert Triage when it comes to enabling a more seamless incident resolution workflow.
- Centralize critical alert information – Instead of dealing with multiple alerts independently with no correlation and consolidation, responders can now investigate a single incident to get up to speed quickly.
- See all impacted services – Quickly identify all the services that have been impacted by the incident.
- Streamline cross-functional handoff – This is especially valuable for NOC and first-level responders as now they only have to interact with a single object for reassignment, instead of manually reassigning or escalating individual alerts.
- Reduce alert fatigue – Responders now only get paged on a single incident with all consolidated alert context, instead of multiple times from siloed tools sending redundant alerts.
- Establish incident command for improved collaboration – Response workflows, such as responder mobilization and conferencing, are now centered around the enhanced incident object with all relevant context, streamlining communications.
- Leverage bulk actions for enhanced speed – When an action gets taken on a parent incident, it automatically gets applied to all the child alerts, and vice versa.
The use of alerts and the new Alert Triage capability is a critical building block for unlocking enhanced value within PagerDuty and is available to all customers at no additional cost. We highly encourage you to learn more by reading the following support articles:
Don’t hesitate to reach out to email@example.com if you have any questions or feedback, which we look forward to answering. We hope that with Alert Triage, you and your teams enjoy the benefits of optimized incident response