Why We Use On-Call Shadowing On-call shadowing is an essential practice at PagerDuty. For a new engineer, a shadowing period serves as a kinder, smoother...by Max Timchenko
March 26, 2019
On-call shadowing is an essential practice at PagerDuty. For a new engineer, a shadowing period serves as a kinder, smoother ramp-up to going on-call, with none of the stress or responsibility for diagnosing and fixing the issue.
When we configure shadowing in PagerDuty, our goal is to simulate the process and actions of going on call as precisely as we can while making sure that actions of the “Shadow User” do not affect the primary engineer who is actually on call. This results in the Shadow User becoming confident and comfortable with our processes, while the primary on-call responder can still carry out their actions undeterred.
The first step is to set up a dedicated PagerDuty account for shadowing. On this Shadow Account, we add services and email integrations for teams being shadowed. One service per team is enough if your team only has one person shadowing at a time; more may be necessary if you have multiple people shadowing the same team.
The email addresses used by these integrations are, in turn, set up as notification methods for a Shadow User on the main PagerDuty account, which is added to the main account’s on-call escalation policy for the team to be shadowed.
As a result of this setup, when an incident comes in, a notification is sent to the primary on-call and the Shadow User on the primary account. This creates a separate incident with the same information in the Shadow Account. The Shadow Account notifies the Shadow User and, because it is now a separate incident, they can do anything they want—acknowledge, snooze, add comments, etc.—without changing the status of the actual incident that the primary on-call is working on.
A side benefit of this configuration is that, once set up, the primary account configuration remains static. People who shadow are added, removed, and configured entirely in the Shadow Account. Another benefit is the ability to modify the shadowing schedule to exclude weekends, on-call handover days, etc., and can even be set up to shadow during business hours only. (Note that the Shadow User should not be added to the on-call rotation.)
Let’s walk through the steps in detail. If any of the steps are unclear, check out this configuration and responder training webinar that walks through many of the configuration steps used below.
If you don’t already have a PagerDuty account, you can set up a free trial to follow along with no strings attached.
When there’s no one shadowing, this is the User that will be used by all shadow schedules on the Shadow Account. It does not page anyone. When there are people shadowing, they will override the placeholder user on the corresponding schedule.
The following three entities (schedule, escalation policy, and service) are created per team and per simultaneous shadow. This keeps everything organized and separate. If you are setting up a single shadow position on a single team, one set is all you need.
For this example, we’re using a fictitious “Labs” team. Add the placeholder user to the schedule and configure the schedule to what works best for your team.
Assign the newly created schedule to an escalation policy.
Create a service, enable the email integration, and note the email address (in this case, firstname.lastname@example.org). This is where notifications from the primary account will be sent. Choose the escalation policy created during the previous step. Other settings can stay at their defaults.
Switch to your primary account and create a Shadow User that will generate events for the Shadow Account. There is only one Shadow User per team, even if there are multiple shadow services on the Shadow Account—by listing multiple email addresses as contact methods, one user can notify all the shadow services.
Set up the notification rules to generate an email immediately to all the corresponding shadow services.
You likely have a schedule for your on-call escalation policy for the “Labs” team on the primary PagerDuty account. Add the Shadow User to the escalation policy alongside that schedule, so that the Shadow User will be notified regardless of who is on call. Important: Do not add the Shadow User directly to the escalation policy—doing so means this person’s actions can interfere with the actual incident handling.
This completes the one-time configuration. Now you can test whether it actually works.
Choose a service used by the team on the primary account and manually trigger an incident.
The incident will show up on the primary account.
It will also show up on the Shadow Account, with the placeholder user being paged.
Acknowledge and resolve the shadow incident, and note that no changes were made to the incident on the primary account. Perfect! That’s what we were aiming for.
When someone wants to shadow, invite them to the Shadow Account. Ask the user to set up notification methods and other user information as usual, and then set them as override on the shadow schedule. When the shadowing period ends, delete the user.
At PagerDuty, having separate services and schedules for shadowing allows us to modify shadowing times. Instead of following the primary’s schedule (which is often 24 hours a day, 7 days a week at a time), we can choose to lighten the shadowing load by excluding weekends or restricting shadowing to business hours only.
Excluding specific days is easiest to do using the “Restrict on-call shifts to specific times” feature. Setting up business-hours-only shadowing is easiest from a shadow service (e.g., “How should responders be notified?” and “Use defined support hours”).
Remember: When setting up a shadowing practice at your organization, be mindful that if the shadow is added to the on-call rotation, they become the primary on-call, and if the shadow is added to the escalation policy their actions can interfere with incident handling.
PagerDuty is transparent in its shadowing practices because we want everyone at the company to know what PagerDuty does. We encourage everyone, regardless of their position within the company, to spend a week shadowing an engineering team using PagerDuty to understand what our product does and how to use it. Several of our teams with a weekly on-call rotation have set their default shadowing schedule to exclude weekends and the day that the on-call is transferred from one engineer to the next, resulting in a 4-days-a-week, 24-hours-a-day “shadowing shift.”
Most of our teams let new engineers decide when they want to start shadowing and when they feel ready to join the on-call rotation. Our expectation is that shadowing will begin sometime during the first three months, and our culture of shared responsibility and blamelessness makes it less daunting to make the switch from shadowing to being on call.
Now that you know how to set up shadowing like we do at PagerDuty, we encourage you to make use of this essential practice for better on-call experiences and smoother onboarding of your engineers.