Checkmk Integration Guide

Checkmk is built on top of Nagios, which is one of the leading providers of open source and enterprise-grade IT infrastructure monitoring tools. Used by hundreds of thousands of users worldwide, Nagios allows its users to monitor their entire IT infrastructure, spot problems before they occur, detect security breaches and plan/budget for IT upgrades. By integrating PagerDuty into your existing Checkmk monitoring solution, you can have alerts go directly to the the person on-call in your PagerDuty schedule.

This guide describes how to integrate Checkmk 1.2.x, by itself or as part of the Open Monitoring Distribution (OMD), with PagerDuty using our easy to install agent. Note that you must be logged in as root to complete the installation. You might need to slightly alter these instructions depending on your exact Linux distribution as well as your Checkmk configuration and version. Please contact our support team if you have any trouble completing the integration.

Note: If you are running Nagios on CentOS 5, you will need to use the Perl-based integration for Checkmk instead of following this guide.

In PagerDuty

    1. From the Configuration menu, select Services.
    2. On your Services page:If you are creating a new service for your integration, click +Add New Service.

      If you are adding your integration to an existing service, click the name of the service you want to add the integration to. Then click the Integrations tab and click the +New Integration button.

RS-Add-New-Service
RS-Add-Integration-Existing-Service

  1. Select your app from the Integration Type menu and enter an Integration Name.If you are creating a new service for your integration, in General Settings, enter a Name for your new service. Then, in Incident Settings, specify the Escalation Policy, Notification Urgency, and Incident Behavior for your new service.
  2. Click the Add Service or Add Integration button to save your new integration. You will be redirected to the Integrations page for your service.
    RS-Integration-Settings
  3. Copy the Integration Key for your new integration: RS_API_pd_3

On Your Checkmk Server

This guide includes steps for the standalone version of Checkmk as well as the OMD version. You will need to adjust the paths used depending on the version of Checkmk you’re using. Note that all commands provided are intended to be run as the root user.

  1. Install the PagerDuty Agent. The agent receives events from Checkmk and sends them to PagerDuty using a queue, provides logging that helps troubleshoot any problems, and automatically retries sending alerts in the event of any connection failure (i.e. if your Checkmk server temporarily loses connectivity).
    Note: The Agent does not run on CentOS 5 or lower, as it requires a newer version of Python than the version included with CentOS 5. Please use the Perl-based integration for Checkmk on older operating systems.
  2. Download pagerduty-agent from GitHub and make it executable:
    wget https://gist.githubusercontent.com/Deconstrained/466645559c28240472f44864658ee48f/raw/0bbb795ee3309273a1dfe5d5269b4b8f2810d67e/pagerduty-agent
    chmod +x pagerduty-agent
    

    pagerduty-agent is a notification script for Checkmk and is not the agent or a replacement for it. You must still install the agent separately.

  3. Move the notification script into place.For the standalone version of Checkmk this is usually /usr/share/check_mk/notifications:
    mv pagerduty-agent /usr/share/check_mk/notifications

    For the OMD version of Checkmk this is usually /omd/sites/{site-name-here}/local/share/check_mk/notifications:

    mv pagerduty-agent /omd/sites/{site-name-here}/local/share/check_mk/notifications
  4. Log in to the Checkmk web interface, go to Users (located in the WATO · Configuration box) and click New User.
  5. Enter a Username and, optionally, a Full name for the PagerDuty user. You may find it beneficial to set the full name to match the name of the PagerDuty service you created if you will want to configure Checkmk hosts and services to alert multiple PagerDuty services in the future.
    Do not enter a password for this user; instead you will want to check disable the login to this account, as this account exists solely to send notifications to the PagerDuty Agent.
    Set the user’s role to Normal monitoring user, or any custom role you’ve created with permissions to send notifications, and add the user to the Contact Groups which the hosts/services you want to receive alerts for are part of. Click Save when you are done.
  6. Click the Notifications icon (broadcast tower) for the user you created. If you are using Checkmk 1.2.4 or earlier, click the Edit icon (pencil) instead.
  7. Click New Rule. If you are using Checkmk 1.2.4 or earlier, scroll down to the Notifications box instead.
  8. Enter a Description for the new notification method, then set Notification Method to PagerDuty Agent. Paste the Integration Key you copied from PagerDuty earlier in the text box that appears once you select PagerDuty Agent, and select any desired conditions to limit the alerts that get sent to PagerDuty. Click Save when you are done.

    If you are using Checkmk 1.2.4 or earlier, check enable notifications and set the Notification Method to Flexible Custom Notifications. Click Add notification and set the Notification Plugin to PagerDuty Agent. Paste the Integration Key you copied from PagerDuty earlier in the first Plugin Arguments text box that appears once you select PagerDuty Agent, then uncheck the boxes Start or end of flapping state and Start or end of scheduled downtime under Host Events and Service Events for the Notification Method (not the Notification Options). Click Save when you are done.
  9. Go back to the Users list and click # Changes, then click Activate Changes.
  10. Congratulations! When you see Configuration successfully activated you are done! Checkmk will now be able to trigger, acknowledge and resolve incidents in PagerDuty, and the PagerDuty Agent will re-try sending events in case they aren’t successfully sent in the first attempt (i.e. due to connectivity issues).

Next Steps

You can test the integration to make sure everything works as expected by going to a host or service in the Checkmk interface and clicking the Execute icon (hammer). In the Fake check results box, click Critical (if on a service) or Down (if on a host), then click Yes! to confirm you want to send the fake alert. You should see a new incident created in PagerDuty momentarily, however keep in mind that the test incident may be resolved quickly, as the fake check results are replaced by real check results on the next scheduled check.

FAQ

How do I configure Checkmk to work with multiple PagerDuty services?

This is most easily accomplished with Event Routing, but it is also possible to do this within Checkmk itself by having each Checkmk service in PagerDuty directly mapped to a user in Checkmk. In order to configure multiple services, just create multiple users in Checkmk with different names (i.e. pagerduty_database, pagerduty_network, etc). Then copy and paste the corresponding Integration Key from PagerDuty into the Notification Method parameters/Plugin Arguments field. Don’t forget to activate your changes for the configuration to take effect.

Note: If you are using Rules in Checkmk to control which contact is notified under different conditions, note that the rules are tried sequentially, and only the first rule whose criteria match will be applied and used.

What if a Checkmk event happens while my network is down?

If a PagerDuty server can’t be reached for any reason, events will be stored to an on-disk queue. The PagerDuty agent will attempt to re-send the events when connectivity is restored.

Since Checkmk needs my external Internet connection to send failure reports to PagerDuty, how will I receive notification if our site loses external connectivity?

You should configure an external ping check service such as StatusCake or NodePing to monitor your site’s external connectivity. Of course, you can use PagerDuty to receive alerts from these services as well.

The integration doesn’t seem to be working. What’s going on?

First, make sure you’ve installed the PagerDuty Agent, and that there were no errors from your package manager when attempting to install it. Failed installs (i.e. due to an incompatible distribution, such as CentOS 5) are the most common issue with the integration not working.

Other common issues include the integration key being changed (i.e. from a user regenerating the key, or deleting and re-creating the Checkmk service in PagerDuty), or using the wrong integration type (i.e. Generic API instead of Checkmk).

If Checkmk alerts still aren’t triggering incidents in PagerDuty, check the notification log at /var/log/nagios.log (for the standalone version of Checkmk) or /omd/sites/{site-name-here}/var/log/nagios.log (for the OMD version) for potential errors, or contact our support team for assistance.

What sort of Nagios messages does PagerDuty understand with the Checkmk integration?

PagerDuty can process PROBLEMACKNOWLEDGEMENT, and RECOVERY messages. All other messages, including FLAPPINGSTART and FLAPPINGSTOP, or custom messages, are ignored.

Start Using PagerDuty Today

Try PagerDuty free for 14 days — no credit card required.