Heartbeat Event Monitoring
Monitor system connectivity and detect missed heartbeats using PagerDuty's Heartbeat Event Monitoring
Advanced State Management
Eliminate manual heartbeat management with automation that differentiates between maintenance windows and actual system failures using Cache Variables and Event Orchestration.
Service-Centric Context
Get immediate visibility into which business services are affected, their dependencies, and customer impact when heartbeats fail.
Automated Resolution
Accelerate response times with automatic alert resolution when heartbeats resume and incident creation that focuses teams on real problems rather than noise.
How can PagerDuty Advance assist you today?
Problem
Organizations using legacy heartbeat monitoring face operational blind spots with basic binary status checks that require manual association between heartbeat failures and actual system issues, creating noise from stale alerts and lacking context about service impact or business consequences.
Solution
PagerDuty's AIOps-powered heartbeat monitoring transforms simple connectivity checks into intelligent operational management with automated state tracking, rule-based association, and service-centric context that reduces noise while accelerating resolution times through comprehensive business impact visibility.
Technical Job Steps
1a. Create Event Data Cache Variable
- For events where event_action = “trigger”
- Extract the dedup_key field from the current event
1b. Create Event Count Cache Variable
- For events where event_action = “trigger”
- Set duration for your desired Heartbeat time window
2. Create Orchestration Rule to Resolve Alerts
Condition
- For events where event_action = “trigger”
- Event Count Cache Variable >= 1
- Event Data Cache Variable Exists
Action
Suspend Alert for the desired Heartbeat time window + 5 seconds (the additional time is buffer to allow the webhook to resolve the previous alert)
Trigger Webhook on Alert Suspended
- URL: https://events.pagerduty.com/v2/enqueue
- Routing_key: COPY INTEGRATION KEY FROM EO
- Dedup_key : {{EVENT DATA CACHE VARIABLE NAME}}
- Event_action: “resolve”
- *Additional Fields* (Please note that depending on the Routing Rules within the Event Orchestration, more fields may be required so that the event lands on the correct service to resolve the alert. This is not required if the Event Orchestration has Global Dedup configured)
3. Create Orchestration Rule to Capture First Heartbeat Event
Condition
- For events where event_action = “trigger”
- Event Count Cache Variable < 1
Action
- Suspend Alert for the desired Heartbeat time window + 5 seconds