This is a guest post by Ilan Rabinovitch, Director of Product Management at Datadog. The convergence of rapid feature development, automation, continuous delivery, and the shifting...by Ilan Rabinovitch
August 24, 2017
Anything can happen while you’re on-call. You can experience a quiet, incident-free shift or suffer a severe outage that makes your head explode. Since you don’t know what you’ll get, you always have to be prepared for anything. Being on-call is stressful enough as is so we strive make it less painful with easy scheduling and actionable alert routing to the right person. When you’re not on-call, we’ll leave you alone so you can enjoy your personal time off.
Flexible Schedules For Your Unique Needs
Everyone wants time off to recharge. With rotating on-call shifts, you can. On-call should not be limited by location or time zone, so expand on-call to those previously out of range to fairly share the responsibility. PagerDuty on-call schedules contain Layers that are made up of team members with the same rotation. Rotations are customizable and can be restricted to specific times of the day. If you have a global team want everyone to be able to save their nights for romance (or whatever they’re into), create Follow-The-Sun schedules by with business hours only layers.
On-call schedules should stay consistent so there is no confusion if you’re on or off. When changes are needed because a teammate is sick or on going on vacation, save time from having to re-do the entire schedule with a one-time change using Overrides.
Tip: Many customers hand-off their weekly on-call duties on a weekday so both parties are in work-mode. Additionally, many customers noted that they have a lot of company holidays on Mondays so they hand-off on-call shifts on Tuesdays during the middle of their business hours for a smoother transition.
Have Backups for All Alerts
Even the most reliable engineer can miss an incident from time-to-time. To ensure that outages aren’t extended from missed alerts, Escalation Policies automatically re-route alerts to the standby on-call engineer. Alerts can be escalated to specific Users or to Users who are part of an on-call schedule. By setting Primary and Secondary On-Call Schedules, an initial owner will be assigned the incident and should anything happen, a teammate is available as a backup to catch missed alerts. Setting up a Primary and Secondary On-call Schedule is the same process as setting up a general on-call rotation.
Escalation Policies should always be set in order to catch missed alerts. Even though it seems like being on a Secondary On-Call Schedule makes your on-call life twice as long, you are likely not going to have to take an action because are supporting your responsive engineer. And when you do, you know you are helping your teammate out who may be stuck in an emergency.
Tip: Primary and Secondary On-Call Schedules should be staggered so the same person isn’t on-call at the same time on both schedules. After alerts have escalated past the primary and secondary on-call engineer, have a manager be the anchor. Usually, alerts will be responded to before it gets to that point, but as a fail-safe if last person of the Escalation Policy misses the incident as well, you program the Escalation policy to cycle through the Escalation Policy multiple times.
Always Know When You’re On (And Off!)
When you’re off-call, you don’t want to spend personal time thinking about when you’re going to be on-call again. With PagerDuty’s Hand-Off Notifications, we’ll let know when your shift starts and ends. Additionally, if you prefer to centralize your PagerDuty on-call schedule on Gmail or Outlook, Export Schedules or sync through iCal.
How PagerDuty On-Call Scheduling and Alert Routing Work together:
Have a Life, Even While On-Call
Why make on-call harder for yourself? Major outages happen and you want to know as soon as possible, but they don’t occur all the time. Unchain yourself from your desk with PagerDuty alerts – we’ll find you wherever you are. Also with PagerDuty, when you’re off, you’re really off allowing you to fully recharge and be ready for the next call of on-call duty.