Prometheus Integration Guide

Prometheus + PagerDuty Benefits

  • Send richly formatted event data from Prometheus to PagerDuty, allowing you to engage the right people, accelerate resolution and improve learning.
  • Create high and low urgency incidents based on the severity of the event from the Prometheus event payload.

How it Works

  • Prometheus sends events to PagerDuty via a Prometheus Alertmanager configuration file. Events from Prometheus will trigger a new incident on the corresponding PagerDuty service, or group as alerts into an existing incident. 
  • You can resolve PagerDuty incidents from the Prometheus server as long as the send_resolved configuration option is not set to false. The default value is true, so there’s no need to specify send_resolved: true to have PagerDuty incidents be resolved automatically.
  • The Prometheus integration uses our v1 Events API.

Requirements

  • Important note for Prometheus Alertmanager v0.11 and later: Alertmanager now supports Events API v2. However, if you set the routing_key property and use v2, the integration type of the integration corresponding to the routing_key value must also be Events API v2. If you select Prometheus as the integration type in PagerDuty, you will need to use the Events API v1 type and set a value for the service_key property instead.
  • A Manager base role or higher is required to configure this integration. If you’re not sure what role you have, or if you need your permissions adjusted, visit our sections on Checking Your User Role or Changing User Roles.

Integration Walkthrough

In PagerDuty

There are two ways to integrate with PagerDuty: via global event routing or directly through an integration on a PagerDuty service. Integrating with global event routing may be beneficial if you want to build different routing rules based on the events coming from the integrated tool. Integrating with a PagerDuty service directly can be beneficial if you don’t need to route alerts from the integrated tool to different responders based on the event payload. 

Integrating with Global Event Routing

1. From the Configuration menu, select Event Rules.

2. On the Event Rules screen, copy your Integration Key and keep it in a safe place for later use.

You can now proceed to the On Your Prometheus Server section below. 

Integrating With a PagerDuty Service

1 . From the Configuration menu, select Services.

2. If you are creating a new service for your integration, please follow the steps outlined in the Create a New Service section, selecting Prometheus as the Integration Type in step 4. Continue with step 4 (below) once you have finished these steps.

If you are adding your integration to an existing service, click the name of the service you want to add the integration to. Then click the Integrations tab and click the +New Integration button.

3. Select Prometheus from the Integration Type menu and enter an Integration Name in the format monitoring-tool-service-name (e.g. “Prometheus-Checkout-Server”). Click the Add Integration button to save your new integration. 

4. You will be redirected to the Integrations page for your service. Copy the Integration Key for your new integration.

On Your Prometheus Server

1. Install the Prometheus Alertmanager if you don’t have it installed already. The Alertmanager is required for this integration, as it handles routing alerts from Prometheus to PagerDuty.

2. Create an Alertmanager configuration file if you don’t have one already. You can find an example configuration file on GitHub.

3. Create a receiver for PagerDuty in your configuration file. Give the receiver a name, such as “PagerDuty-Global-Event-Rules” or the name of the Service you’re integrating with. Next, paste the PagerDuty Integration Key (generated in the In PagerDuty section, above) in the service_key field, then save your configuration file.

receivers:
- name: YOUR-RECEIVER-NAME
  pagerduty_configs:
  - service_key: YOUR-INTEGRATION-KEY

4. You can configure the default route in Prometheus to send all alerts which don’t match any custom routes to your new PagerDuty receiver. Here’s an example showing how you would configure the default route:

route:
 group_by: [cluster]
 receiver: YOUR-RECEIVER-NAME

5. You can also configure custom `routes` to send alerts to different `receivers`. For example, if you only want alerts with the severity of `warning` to be sent to PagerDuty, you would set a different default route and create a special `warning` route like this:

 routes:
  - match:
      severity: 'warning'
    receiver: YOUR-RECEIVER-NAME

6. Thanks to the Prometheus Alertmanager’s powerful routes and receiver configuration options, you can configure multiple receivers with different PagerDuty integration keys, and different routes to send specific types of alerts to different receivers.

Here’s an example configuration which sets up a route that captures alerts for a database service and sends them to a receiver linked to a service that will directly notify the DBAs I have in PagerDuty, while all other alerts will be directed to a default receiver with a different PagerDuty integration key:

route:
 group_by: [cluster]
 receiver: **DEFAULT-RECEIVER**
 group_interval: 5m
 routes:
  - match:
      service: database
    receiver: **DATABASE-RECEIVER**

receivers:
- name: **DEFAULT-RECEIVER**
  pagerduty_configs:
  - service_key: **PRIMARY-INTEGRATION-KEY**

- name: **DATABASE-RECEIVER**
  pagerduty_configs:
  - service_key: **DATABASE-INTEGRATION-KEY**

7. Start the Alertmanager, or restart it for your configuration changes to take effect if it was already running.

8. Congratulations! Prometheus will now be able to trigger and resolve incidents in PagerDuty. You can verify this by triggering a test incident using the following curl command:

curl -d '[{"labels": {"Alertname": "PagerDuty Test"}}]' http://localhost:9093/api/v1/alerts

FAQ

Will PagerDuty incidents be resolved when an alert is resolved in Prometheus?

Yes, as long as the send_resolved configuration option is not set to false. The default value is true, so there’s no need to specify send_resolved: true to have PagerDuty incidents be resolved automatically.

Also note that resolve notifications may take up to the next group_interval to be sent, and only a “best effort” is made to send the notification to PagerDuty according to the Prometheus Team.

I only get one notification for multiple different Prometheus alerts; how do I fix this?

Try adjusting the match and group_by options for your PagerDuty route. The deduplication key (a.k.a. incident key), which is used to determine whether alerting events concern a unique issue, is generated based on these options. If a series of alerts have the same values for the properties in group_by, they will have the same value for the deduplication key and thus will be merged into the earliest existing open alert/incident (rather than triggering new ones).

Start Using PagerDuty Today

Try PagerDuty free for 14 days — no credit card required.