Heartbeat Event Monitoring

Monitor system connectivity and detect missed heartbeats using PagerDuty's Heartbeat Event Monitoring

Advanced State Management

Eliminate manual heartbeat management with automation that differentiates between maintenance windows and actual system failures using Cache Variables and Event Orchestration.

Service-Centric Context

Get immediate visibility into which business services are affected, their dependencies, and customer impact when heartbeats fail.

Automated Resolution

Accelerate response times with automatic alert resolution when heartbeats resume and incident creation that focuses teams on real problems rather than noise.

How can PagerDuty Advance assist you today?

Problem

Organizations using legacy heartbeat monitoring face operational blind spots with basic binary status checks that require manual association between heartbeat failures and actual system issues, creating noise from stale alerts and lacking context about service impact or business consequences.

Solution

PagerDuty's AIOps-powered heartbeat monitoring transforms simple connectivity checks into intelligent operational management with automated state tracking, rule-based association, and service-centric context that reduces noise while accelerating resolution times through comprehensive business impact visibility.

Technical Job Steps

1a. Create Event Data Cache Variable

  • For events where event_action = “trigger”
  • Extract the dedup_key field from the current event

1b. Create Event Count Cache Variable

  • For events where event_action = “trigger”
  • Set duration for your desired Heartbeat time window

2. Create Orchestration Rule to Resolve Alerts

Condition

  • For events where event_action = “trigger”
  • Event Count Cache Variable >= 1
  • Event Data Cache Variable Exists

Action

Suspend Alert for the desired Heartbeat time window + 5 seconds (the additional time is buffer to allow the webhook to resolve the previous alert)

Trigger Webhook on Alert Suspended

  • URL: https://events.pagerduty.com/v2/enqueue
  • Routing_key: COPY INTEGRATION KEY FROM EO
  • Dedup_key : {{EVENT DATA CACHE VARIABLE NAME}}
  • Event_action: “resolve”
  • *Additional Fields* (Please note that depending on the Routing Rules within the Event Orchestration, more fields may be required so that the event lands on the correct service to resolve the alert. This is not required if the Event Orchestration has Global Dedup configured)

3. Create Orchestration Rule to Capture First Heartbeat Event

Condition

  • For events where event_action = “trigger”
  • Event Count Cache Variable < 1

Action

  • Suspend Alert for the desired Heartbeat time window + 5 seconds

See what you can automate today