Check_MK Integration Guide

Check_MK is built on top of Nagios, which is one of the leading providers of open source and enterprise-grade IT infrastructure monitoring tools. Used by hundreds of thousands of users worldwide, Nagios allows its users to monitor their entire IT infrastructure, spot problems before they occur, detect security breaches and plan/budget for IT upgrades. By integrating PagerDuty into your existing Check_MK monitoring solution, you can have alerts go directly to the the person on-call in your PagerDuty schedule.

This guide describes how to integrate Check_MK 1.2.x, by itself or as part of the Open Monitoring Distribution (OMD), with PagerDuty using our easy to install agent. Note that you must be logged in as root to complete the installation. You might need to slightly alter these instructions depending on your exact Linux distribution as well as your Check_MK configuration and version. Please contact our support team if you have any trouble completing the integration.

Note: If you are running Nagios on CentOS 5, you will need to use the Perl-based integration for Check_MK instead of following this guide.

In PagerDuty

    1. From the Configuration menu, select Services. 
    2. On your Services page:

      If you are creating a new service for your integration, click +Add New Service.

      If you are adding your integration to an existing service, click the name of the service you want to add the integration to. Then click the Integrations tab and click the +New Integration button.

    3. RS-Add-New-Service
      RS-Add-Integration-Existing-Service

    4. Select your app from the Integration Type menu and enter an Integration Name.

      If you are creating a new service for your integration, in General Settings, enter a Name for your new service. Then, in Incident Settings, specify the Escalation Policy, Notification Urgency, and Incident Behavior for your new service.

    5. Click the Add Service or Add Integration button to save your new integration. You will be redirected to the Integrations page for your service.
      RS-Integration-Settings
    6. Copy the Integration Key for your new integration: RS_API_pd_3

    On Your Check_MK Server

    This guide includes steps for the standalone version of Check_MK as well as the OMD version. You will need to adjust the paths used depending on the version of Check_MK you’re using. Note that all commands provided are intended to be run as the root user.

    1. Install the PagerDuty Agent. The agent receives events from Check_MK and sends them to PagerDuty using a queue, provides logging that helps troubleshoot any problems, and automatically retries sending alerts in the event of any connection failure (i.e. if your Check_MK server temporarily loses connectivity).

      Note: The Agent does not run on CentOS 5 or lower, as it requires a newer version of Python than the version included with CentOS 5. Please use the Perl-based integration for Check_MK on older operating systems.

    2. Download pagerduty-agent from GitHub and make it executable:

      wget https://gist.githubusercontent.com/jcurreee/b21938f316f92dd3fadf/raw/cd59f855145692d96dd32164190faa1237a0d89e/pagerduty-agent
      chmod +x pagerduty-agent
      		

      pagerduty-agent is a notification script for Check_MK and is not the agent or a replacement for it. You must still install the agent separately.

    3. Move the notification script into place.

      For the standalone version of Check_MK this is usually /usr/share/check_mk/notifications:

      mv pagerduty-agent /usr/share/check_mk/notifications

      For the OMD version of Check_MK this is usually /omd/sites/{site-name-here}/local/share/check_mk/notifications:

      mv pagerduty-agent /omd/sites/{site-name-here}/local/share/check_mk/notifications
    4. Log in to the Check_MK web interface, go to Users (located in the WATO · Configuration box) and click New User.

    5. Enter a Username and, optionally, a Full name for the PagerDuty user. You may find it beneficial to set the full name to match the name of the PagerDuty service you created if you will want to configure Check_MK hosts and services to alert multiple PagerDuty services in the future.

      Do not enter a password for this user; instead you will want to check disable the login to this account, as this account exists solely to send notifications to the PagerDuty Agent.

      Set the user’s role to Normal monitoring user, or any custom role you’ve created with permissions to send notifications, and add the user to the Contact Groups which the hosts/services you want to receive alerts for are part of. Click Save when you are done.

    6. Click the Notifications icon (broadcast tower) for the user you created. If you are using Check_MK 1.2.4 or earlier, click the Edit icon (pencil) instead.

    7. Click New Rule. If you are using Check_MK 1.2.4 or earlier, scroll down to the Notifications box instead.

    8. Enter a Description for the new notification method, then set Notification Method to PagerDuty Agent. Paste the Integration Key you copied from PagerDuty earlier in the text box that appears once you select PagerDuty Agent, and select any desired conditions to limit the alerts that get sent to PagerDuty. Click Save when you are done.

      If you are using Check_MK 1.2.4 or earlier, check enable notifications and set the Notification Method to Flexible Custom Notifications. Click Add notification and set the Notification Plugin to PagerDuty Agent. Paste the Integration Key you copied from PagerDuty earlier in the first Plugin Arguments text box that appears once you select PagerDuty Agent, then uncheck the boxes Start or end of flapping state and Start or end of scheduled downtime under Host Events and Service Events for the Notification Method (not the Notification Options). Click Save when you are done.

    9. Go back to the Users list and click # Changes, then click Activate Changes.

    10. Congratulations! When you see Configuration successfully activated you are done! Check_MK will now be able to trigger, acknowledge and resolve incidents in PagerDuty, and the PagerDuty Agent will re-try sending events in case they aren’t successfully sent in the first attempt (i.e. due to connectivity issues).

    Next Steps

    You can test the integration to make sure everything works as expected by going to a host or service in the Check_MK interface and clicking the Execute icon (hammer). In the Fake check results box, click Critical (if on a service) or Down (if on a host), then click Yes! to confirm you want to send the fake alert. You should see a new incident created in PagerDuty momentarily, however keep in mind that the test incident may be resolved quickly, as the fake check results are replaced by real check results on the next scheduled check.

    FAQ

    How do I configure Check_MK to work with multiple PagerDuty services?

    This is easy to do with the current integration, as a Check_MK service in PagerDuty is directly mapped to a user in Check_MK. In order to configure multiple services, just create multiple users in Check_MK with different names (i.e. pagerduty_database, pagerduty_network, etc). Then copy and paste the corresponding Integration Key from PagerDuty into the Notification Method parameters/Plugin Arguments field. Don’t forget to activate your changes for the configuration to take effect.

    What if a Check_MK event happens while my network is down?

    If a PagerDuty server can’t be reached for any reason, events will be stored to an on-disk queue. The PagerDuty agent will attempt to re-send the events when connectivity is restored.

    Since Check_MK needs my external Internet connection to send failure reports to PagerDuty, how will I receive notification if our site loses external connectivity?

    You should configure an external ping check service such as StatusCake or NodePing to monitor your site’s external connectivity. Of course, you can use PagerDuty to receive alerts from these services as well.

    The integration doesn’t seem to be working. What’s going on?

    First, make sure you’ve installed the PagerDuty Agent, and that there were no errors from your package manager when attempting to install it. Failed installs (i.e. due to an incompatible distribution, such as CentOS 5) are the most common issue with the integration not working.

    Other common issues include the integration key being changed (i.e. from a user regenerating the key, or deleting and re-creating the Check_MK service in PagerDuty), or using the wrong integration type (i.e. Generic API instead of Check_MK).

    If Check_MK alerts still aren’t triggering incidents in PagerDuty, check the notification log at /var/log/nagios.log (for the standalone version of Check_MK) or /omd/sites/{site-name-here}/var/log/nagios.log (for the OMD version) for potential errors, or contact our support team for assistance.

    What sort of Nagios messages does PagerDuty understand with the Check_MK integration?

    PagerDuty can process PROBLEMACKNOWLEDGEMENT, and RECOVERY messages. All other messages, including FLAPPINGSTART and FLAPPINGSTOP, or custom messages, are ignored.