What is Event Orchestration? 7 ways to start using this powerful new feature from PagerDuty to reduce noise and automate away manual toil today
Does your team deal with too much noise? Does your heart sink a bit when you think about how much your rulesets have sprawled in order to manage your event processing needs? That’s why we released Event Orchestration earlier this year to help teams reduce the amount of manual work that goes into event management. Event Orchestration is the next evolution of our Event Rules feature set, which helps to route, enrich, and modify events on ingest to remove noise and automate processes.
We took Event Rules and supercharged it to handle more complex, custom logic and sophisticated conditional event processing capabilities. We even wrote our own condition language (PagerDuty Condition Language or PCL, pronounced “pickle”) to enable this – you can learn about how we built it from Staff Engineer Barry Kim’s Summit session “PCL 101” here.
Event Orchestration is now the best way for users to compress rule volumes, improve noise reduction, and more effectively automate away well-understood manual work. We’ve recently announced that to ensure that we are dedicating our resources toward building the most robust and reliable event-driven enrichment and automation experience for our customers, we will End-of-Life Event Rules and migrate all customers to Event Orchestration early next year. For more information about this and the various migration options, we’ve outlined everything in this Knowledge Base article.
In this blog, I’m going to walk through how Event Orchestration is different from Event Rules and review seven common use cases for Event Orchestration that we’re seeing make the most impact for our customers.
What is Event Orchestration? And how’s it different from Event Rules?
Event Orchestration is a direct upgrade from Event Rules. Basic Event Orchestrations can perform all the same basic event processing actions that event rules can perform with the added benefits of improved UI, better rule creation, APIs and Terraform support, and advanced conditions. For customers with the Event Intelligence add-on or Digital Operations plans, Advanced Event Orchestrations bring even more functionality to the table, including contextual conditions, webhooks, paused incident notifications, rule nesting, and a direct integration with Automation Actions.
Below are a few of the key ways that Event Orchestration is superior to Event Rules:
- Easier to use: Architecturally, Event Orchestration takes advantage of PagerDuty’s more modern approach to front-end development by leveraging React as its core frontend stack. This allows customers to navigate their rules with less lag and greater support for accessibility improvements in the future.
- More complex event processing: Because of the condition language that Event Orchestration supports and the capability to nest rules, customers using Event Orchestration can perform complex event processing actions with a fraction of the configuration effort. What could once be accomplished with 10 event rules can now be done with 1 Event Orchestration rule.
- More robust support for automation: Users can trigger webhooks with custom headers or automation actions.
- More precise event processing: Rule nesting allows users to execute automations with a high degree of precision as customers can itemize out in detail each known failure start for their systems, deploying automation to each with confidence.
What are the most common use cases for Event Orchestration?
With all this additional functionality, I hope it’s clear that Event Orchestration has the potential to significantly improve your team’s experience as a part of major and minor incident response. But where should people get started?
One of the most popular sessions in our on demand video library at Summit 2022 was 7 Ways to Use Event Orchestration to Reduce Noise and Automate More Often. In the session, Professional Services Consultant Eddie Willits, joined by Senior Product Manager Frank Emery, walks through Event Orchestration and the most common use cases that customers are using the powerful new capability. I’ve summarized them below, but if you’re an audio/visual learner, you can also watch their quick 20 minute session.
Here are the 7 most common use cases for Event Orchestration today:
The trouble with noise is that it’s very distracting. It’s especially annoying when it wasn’t even worth stopping what you were doing to look at it in the first place. Classic examples of this would be events coming from a staging environment or non-critical development events that are sent after hours. How can you ensure that your team only works on the incidents that matter?
Event Orchestration can help teams stay focused on only critical events by only interrupting responders with the most important, time-critical alerts. You can design an orchestration that looks for a certain type of low-priority signal and configure an orchestration that calls PagerDuty’s Pause Incident Notification to handle irrelevant, low value, or distracting events by automatically downgrading or suppressing them entirely. Instead of spending time acknowledging distracting events, responders can stay focused on critical alerts affecting the business.
2) Automated maintenance windows
How often are you thinking “I’m performing maintenance at midnight tonight! How do I make sure that service owners are not woken up?”
Event Orchestration helps with this use case with the ability to create custom logic that accommodates recurring or scheduled rule conditions. Customers can define when all alerts should be suppressed or re-routed to support an ongoing or planned maintenance window. You can even get more specific than a blanket maintenance window per service by setting up rules that have differentiated ways to handle per alert by monitoring tool. An example we’ve seen customers lean into for this would be to configure an orchestration that can adjust severity after hours for production environment-specific events that coincide with on-call and off-call hours.
NOTE: We’re often asked about what happens to the alerts when they’re put in maintenance. Events that come into PagerDuty are always viewable for reference, even if suppressed. These can be seen in the “Alerts” menu.
3) Controlling Alert Storms
Nobody wants to deal with an alert storm. But they do happen. The question is how to control your team’s experience when it happens during a partial or full outage so that it’s minimally disruptive and they can focus on the most important task at hand: get to the fix.
With Event Orchestration, customers can use threshold-based rules to control incident creation behavior during alert storms. You can configure rules that are specific to thresholds to trigger actions that run up to a certain threshold or that run after going above a certain threshold. This gives you even more precision for event enrichment, routing, or grouping in relation to event volume.
4) Routing and Enrichment
When troubleshooting, responders need to be able to quickly understand what happened during an outage. How can you highlight this information better in an incident so responders don’t waste time looking for it?
Event Orchestration can help customers with an automated way to approach standardization of incident data by:
- overriding malformed fields
- replacing fields based on known use cases
- updating the severity/priority/urgency
- adjusting incident creation behavior (email integration)
As an example, you could set up an orchestration where anytime an event contains the payload of “Response Time is High” that is over 1000ms, it will immediately flag the incident as Priority 1.
5) Providing Runbooks
Anytime someone new joins your team, especially when they’re on the junior side, it takes a while to onboard them on specific approaches that are a part of your incident response processes. It takes time to explain and train on how to approach even well-understood, common incidents. One of the most basic forms of automation we’ve seen customers take to address this problem is simply start by writing down how they solve these issues in runbooks that can be shared as tried and true ways to handle repeat issues.
Event Orchestration makes it easy to add notes that contain links to runbooks, or resolution instructions for known issues. That way, while triaging the incident and looking at the alert payload, the runbook is easily accessible for reference. Embedding this actionable intelligence during event processing on ingest means that L1 responders can easily solve common, well-understood issues without further escalation to senior engineers.
6) Updating Systems of Record
Customers using specific ITSM tools for major and minor incidents will be interested in how to keep their system of record in sync with their PagerDuty incidents.
With Event Orchestration webhooks, users are able to ensure that as incidents are ingested they update connected systems. Specific rules contain webhooks that fire off payloads to these systems which create records with up to date event payload information. We’ve seen this used with Jira, ServiceNow, and homegrown CMDB systems. Learn more about PagerDuty’s integrations with ITSM solutions here.
7) Automated Diagnostics and Remediation
Everyone wants to start automating their operational processes. This is not surprising: there are a LOT of manual steps associated with incidents. However, it can be hard to know where and how to start.
Automated diagnostics are a low-risk, high value way to trim down on MTTR time. Think of all the diagnostics you’d have to run at the beginning of an investigation – now, imagine if those were already run by the time your responder got to the incident?
Event Orchestration makes it simple to integrate automation tools via webhooks. It also has a built-in native integration with PagerDuty Automation Actions, which can trigger automated diagnostics and remediation all in the PagerDuty platform. This helps cut down overall time to resolution since diagnostic results are piped directly into incident details and ready for the responder to review.
Learn more about Event Orchestration