3 Major New Features – Part 2: The Nagios -> PagerDuty API

by Andrew Miklas August 3, 2010 | 2 min read

This is second article of a three part series about the latest improvements to PagerDuty. Be sure to check out Part 1 and Part 3.

NagiosWe’ve just released a Nagios API for PagerDuty.  If you’re using Nagios to monitor your hosts, you no longer have to use PagerDuty’s email integration mechanism to get SMSes and phone calls from your Nagios installation.  Instead, you can completely bypass the email step and have Nagios directly communicate problem, acknowledgement, and recovery messages to PagerDuty via a HTTPS API.

Add a Nagios service

The main benefit of the API over the email integration mechanism is that PagerDuty can now automatically close out incidents when Nagios reports that the problem has been fixed.  No more getting a call 30 minutes after fixing a problem because you forgot to mark the incident as resolved in PagerDuty!  Also, since the API allows us to distinguish between PROBLEM and RECOVERY messages, PagerDuty will no longer spuriously start the alerting process on a RECOVERY message.

Using the new Nagios API is very simple — you simply create a Nagios service within PagerDuty, copy a little Perl script to your Nagios server, and then add a “pseudo-contact” to your Nagios config corresponding to the new service.  For step-by-step details on how to do this, please take a look at our Nagios integration guide.

By switching your Nagios installation to use the API, you’ll be able to benefit from a number of new PagerDuty features we have planned.  One feature now in the works is the ability to have PagerDuty send out email and SMS alerts when an incident is resolved.  With this feature, you’ll be able to see at a glance whether an issue has resolved itself before crawling out of bed at 3am.

Another feature we’re now considering is the ability to assign Nagios alerts to different PagerDuty Escalation Policies based on Nagios variables such as the HOSTGROUP and SERVICEGROUP.  Let us know if this sounds useful to you — we’d love to know if this is something that your ops team would use.