Turn any signal into insight and action. See how PagerDuty Digital Operations Management Platform integrates machine data and human intelligence to improve visibility and agility across organizations.
Connect insights to real-time action by aligning teams through the shared language of business impact.
Check out the latest products we’ve been working on—including event intelligence, machine learning, response automation, on-call, analytics, operations health management, integrations, and more.
Digital Operations Management arms organizations with the insights needed to turn data into opportunity across every operational use case, from DevOps, ITOps, Security, Support, and beyond.
Over 300 Integrations
Discover DevOps best practices with our library of webinars, whitepapers, reports, and much more.
Learn best practices and get support help with resources from our award-winning support team.
See how PagerDuty works with our live product demo — twice a week, every week.
We've created a maturity model to assist on the journey to digital operations excellence. Take our short assessment to find out where your team falls!
Interactive, simple-to-use API and technical documentation enables users to easily try updates and extend PagerDuty.
Engage with users and PagerDuty experts from our global community of 200k+ users. Become a member, connect, and share insights for success.
Get all your PagerDuty-related questions answered by exploring our in-depth support documentation and community forums.
Have you ever worked on a team where it was a challenge to give constructive feedback or confidently share ideas? At PagerDuty Summit 2018, Patrick...
PagerDuty helps organizations transform their digital operations. Learn more about PagerDuty's mission and what we do.
Meet our experienced and passionate executive team.
We are risk-taking innovators dedicated to delivering amazing products and delighting customers. Join us and do the best work of your career.
With the PagerDuty Foundation, we are committed to doing our part in giving back to the community.
We’re happy to announce we’ve released the new version of PagerDuty, which has multi-incident support. To try it out, just log into your PagerDuty account.
This new feature corrects an over-simplification in PagerDuty’s design: up to now, PD required you to create a new alarm for each type of problem that your monitoring systems are capable of detecting. Unfortunately, this doesn’t work very well if you’re using a monitoring tool like Nagios, which can monitor thousands of hosts and services at once. The new release can now handle multiple open incidents from a single monitoring system; we call this “multi-incident support”.
Here’s a quick summary of the changes in the new release:
The new version of PD is 100% backwards compatible with the previous version. Yes, we’ve renamed a bunch of stuff, but we’ve been very careful to retain the same behavior as the old version for your existing services. Read on for more details.
PagerDuty is now capable of tracking multiple open concurrent incidents. Put another way, your monitoring system can tell PagerDuty about 100 simultaneous and independent problems without you needing to create 100 PagerDuty alarms (as was the case in the old version of PD).
PagerDuty now uses “incidents” rather than “alarms” as the main object. Your support team will be acknowledging, escalating, and resolving incidents, instead of alarms. Incidents in PagerDuty are similar to tickets in a bug tracking system: they are created when a problem is detected, and are resolved or closed when the problem is fixed.
Since PagerDuty can now handle hundreds of open incidents at once, we’ve tried to carefully design PagerDuty’s interface to make it easy to work with large collections of incidents. The new Incidents and Dashboard tabs feature tables that let you see all of the open incidents assigned to you at a glance. You can also easily triage your incidents straight from these pages using the controls located at the top of the table.
By default, the PagerDuty services still work the same way they’ve always worked: they can only have one incident open at once. The reason for this is to maintain backwards compatibility.
You can enable multi-incident support for any existing service. Here’s how:
The first option, if selected, will cause the service to open a new incident for each trigger email sent to the service’s email address.
The second option, if selected, will cause the service to open a new incident based on the email subject: if an open incident with the same subject already exists, the email is appended to this incident; if not, a new incident is created.
The third option, which should be selected by default for an existing service, allows a service to maintain the behavior of the old version of PagerDuty. It basically turns multi-incident support off: if selected, the service can only have one open incident at any one time. When the service receives a trigger eamil, it opens a new incident if the service doesn’t already have an open incident; otherwise, it appends the email to the open incident.
We’ve renamed “alarms” to “services”. Services are now used only to represent an integration point between PagerDuty and your monitoring services. Currently, the PagerDuty services integrate with your monitoring systems via email integration (just like in the old version of PD). In the coming weeks, we will also add support for an HTTP-based API for the PagerDuty services. This will allow your monitoring systems to trigger/acknowledge/resolve incidents in PagerDuty via a synchronous API call.
For similar reasons, we’ve renamed “alarm groups” to “escalation policies”. We think the new name better captures the use of these objects.
We’ve also renamed incident “suppression” to “acknowledge”. As before, this feature is used to temporarily prevent an incident from generating alerts. We thought the word “acknowledge” better captured the purpose of the feature: “stop bothering me about this problem for now… I’m working on it!”.
We’ve also made the acknowledgement timeout configurable on a service-by-service basis. This means that you can set the amount of time that an incident stays in the Acknowledged state, before it reverts to back to Triggered and alerts you again. The timeout is set to 30 minutes by default for each service, but you can change it or even turn it off easily:
Next up is support for a PagerDuty API. This will make it easier to integrate PagerDuty with popular monitoring tools like Nagios, Zenoss, monit, Munin and many others. The API will allow your monitoring system to trigger, acknowledge and resolve incidents directly in PagerDuty, via a synchronous call to the API.
Since we opened our San Francisco headquarters in 2009, PagerDuty has experienced tremendous growth, with offices in Toronto, Seattle, London, and Sydney. Our global footprint...
We just held our annual conference, PagerDuty Summit 2018, where we shared new product announcements and demoed new capabilities. But while we always have big...
600 Townsend St., #200
San Francisco, CA 94103
905 King Street West, Suite 600
Toronto, ON, M6K 3G9, Canada
1416 NW 46th St., St. 301
Seattle, WA 98107
5 Martin Place
1 Fore St,
London EC2Y 9DT
© 2009 - 2019