Turn any signal into insight and action. See how PagerDuty Digital Operations Management Platform integrates machine data and human intelligence to improve visibility and agility across organizations.
Connect insights to real-time action by aligning teams through the shared language of business impact.
Check out the latest products we’ve been working on—including event intelligence, machine learning, response automation, on-call, analytics, operations health management, integrations, and more.
Digital Operations Management arms organizations with the insights needed to turn data into opportunity across every operational use case, from DevOps, ITOps, Security, Support, and beyond.
Over 300 Integrations
Discover DevOps best practices with our library of webinars, whitepapers, reports, and much more.
Learn best practices and get support help with resources from our award-winning support team.
See how PagerDuty works with our live product demo — twice a week, every week.
We've created a maturity model to assist on the journey to digital operations excellence. Take our short assessment to find out where your team falls!
Interactive, simple-to-use API and technical documentation enables users to easily try updates and extend PagerDuty.
Engage with users and PagerDuty experts from our global community of 200k+ users. Become a member, connect, and share insights for success.
Get all your PagerDuty-related questions answered by exploring our in-depth support documentation and community forums.
In a world where everything comes down to moments of truth, teams must respond to issues and opportunities in seconds. Rising customer expectations demand real-time...
PagerDuty helps organizations transform their digital operations. Learn more about PagerDuty's mission and what we do.
Meet our experienced and passionate executive team.
We are risk-taking innovators dedicated to delivering amazing products and delighting customers. Join us and do the best work of your career.
With the PagerDuty Foundation, we are committed to doing our part in giving back to the community.
Suppression. According to the thesaurus, this word is synonymous with terms like deletion, elimination, and annihilation.
Yet within the context of incident management, suppression means something quite different. It’s not about getting rid of data forever. It serves instead as a way of making sure that admins focus on the right alerts at the right time by mitigating noise.
Here’s a look at how suppression significantly helps streamline incident management.
Why is suppression useful in incident management? Simply put, it’s because modern infrastructure generates a huge volume of alerts and admins can’t reasonably expect to be able to review each and every alert. If they try, they will soon become subject to alert fatigue, which means they will begin ignoring potentially important alerts because they are overwhelmed and burned out. And if they stop paying attention to alerts, then the entire incident management process breaks down.
Alert suppression is a way of avoiding this issue. By suppressing alerts of certain types, admins can ensure that actionable, high-priority alerts receive the greatest attention. They can also reduce the overall number of alerts that appear on their dashboards, which helps to prevent the risk of alert fatigue.
As an example, consider an organization whose workstations reboot once a week overnight after updates are installed. The reboot would generate a series of alerts as workstations go offline and come back up. Adding these to the incidents dashboard that admins see wouldn’t be helpful, because the alerts in this case reflect a routine procedural event that does not require action. In order to avoid adding this unhelpful noise to admins’ dashboards, admins can configure their incident management software to suppress alerts related to a workstation rebooting.
An important point to understand about alert suppression is that suppressing alerts is not an either/or proposition. In other words, admins’ options are not limited simply to enabling all alerts of a certain type or permanently suppressing all of them.
They can instead take a more nuanced approach to suppression. Alert suppression could be configured in such a way that alerts of a given type are suppressed unless they occur repeatedly within a certain period of time, for example. Alerts could also be configured so that they are reported if they occur during a certain time of day, but are suppressed during other times. Similarly, admins might want to suppress alerts of a particular type if they occur on a certain kind of device, but not others.
This flexibility is important because it ensures that admins can maximize the effectiveness of alerts. Instead of applying broad, blunt suppression policies, they can tweak suppression settings in order to maximize the visibility of important events without adding unnecessary noise to the incident management system.
Nuanced suppression could be helpful in the example above. As I noted, admins generally don’t want to receive alerts when a workstation reboots in the middle of the night following a software update. But if the incident management software detects a workstation that reboots multiple times during the same period, that could signal a problem (like a flawed software update) that admins will want to know about. In this situation, having suppression configured so that only recurring reboots generate incidents that appear in the central dashboard, would help to optimize incident management effectiveness.
It’s also worth emphasizing that suppression in the context of incident management does not mean that suppressed alerts disappear forever. On the contrary, suppressed alerts still happen, and data related to them should be saved. The only difference between a suppressed alert and a non-suppressed one is that the former is not sent to priority dashboards in the incident management system.
This is important to understand because it means that admins retain the ability to look up suppressed alerts to gain insight into an incident if they need to. This also helps them better tune their alerting thresholds. In addition, suppressed alerts still figure into historical incident management data, which can be used to reveal lots of valuable information about infrastructure efficiency and health trends.
With suppression, then, you get to have your alerts and eat them, too—or something like that.
Suppressed alerts can be leveraged in any way admins need to help identify and respond to incidents, but they don’t clutter dashboards with non-actionable information that gets in the way of resolving incidents that are likely to be of a higher priority. Moreover, suppression can be tweaked so that alerts are suppressed only under exactly the right circumstances, but are always reported so you gain full visibility into your infrastructure.
This is a guest post by Ilan Rabinovitch, Director of Product Management at Datadog. The convergence of rapid feature development, automation, continuous delivery, and the shifting...
Dynamic Notifications are now out in the wild! With our launch today, we give PagerDuty users the power to dynamically adjust how they are notified...
600 Townsend St., #200
San Francisco, CA 94103
905 King Street West, Suite 600
Toronto, ON, M6K 3G9, Canada
1416 NW 46th St., St. 301
Seattle, WA 98107
5 Martin Place
1 Fore St,
London EC2Y 9DT
© 2009 - 2018