Turn any signal into insight and action. See how PagerDuty Digital Operations Management Platform integrates machine data and human intelligence to improve visibility and agility across organizations.
Connect insights to real-time action by aligning teams through the shared language of business impact.
Check out the latest products we’ve been working on—including event intelligence, machine learning, response automation, on-call, analytics, operations health management, integrations, and more.
Digital Operations Management arms organizations with the insights needed to turn data into opportunity across every operational use case, from DevOps, ITOps, Security, Support, and beyond.
Over 300 Integrations
Discover DevOps best practices with our library of webinars, whitepapers, reports, and much more.
Learn best practices and get support help with resources from our award-winning support team.
See how PagerDuty works with our live product demo — twice a week, every week.
We've created a maturity model to assist on the journey to digital operations excellence. Take our short assessment to find out where your team falls!
Interactive, simple-to-use API and technical documentation enables users to easily try updates and extend PagerDuty.
Engage with users and PagerDuty experts from our global community of 200k+ users. Become a member, connect, and share insights for success.
Get all your PagerDuty-related questions answered by exploring our in-depth support documentation and community forums.
Have you ever worked on a team where it was a challenge to give constructive feedback or confidently share ideas? At PagerDuty Summit 2018, Patrick...
PagerDuty helps organizations transform their digital operations. Learn more about PagerDuty's mission and what we do.
Meet our experienced and passionate executive team.
We are risk-taking innovators dedicated to delivering amazing products and delighting customers. Join us and do the best work of your career.
With the PagerDuty Foundation, we are committed to doing our part in giving back to the community.
Since our first on-call best practices post back in March 2011, on-call scheduling methods have remained mostly unchanged. Many teams start off with sending email alerts to the entire team then someone volunteers to resolve the incident. With this model, some superhero team members end up handling a disproportionate amount of incidents while new hires don’t have the opportunity to learn how to fix them.
Worst of all, everyone is on-call all the time. As your team grows and responsibilities are divided, an on-call rotation system is needed. It’s not easy to implement though; your teammates may be based in multiple cities, schedules change, and each engineer has their own preferred method of being alerted. You need a system that’s flexible enough to address these issues and robust enough to perform reliably.
The Current State of On-Call Scheduling
There are several on-call scheduling methods organizations use today. Some are more sophisticated than others, but each possess their own limitations.
1. Unfair On-Call Burden
A simple, common on-call solution is to use a single dedicated phone or pager that gets handed off to the next on-call engineer. Although this may sound antiquated, many organizations we talked to have used this method. If your team is spread across various cities, some members cannot participate if they are out of range. This creates an unfair burden for some of your superheroes teammates.
2. Delayed Response Time
Another simple – but labor-intensive – option is to staff a 24/7 network operations center (NOC). This method involves paying staff to monitor metrics all day and identify problems themselves. When an issue arises, they have to look up the appropriate contacts in a directory and notify the on-call personnel to resolve the situation. It would be much easier for your NOC team to centrally manage an on-call schedule system which directly notifies the right on-call person and decrease your mean time to response.
3. Alert Fatigue
Some companies keep it simple by sending email blasts to their entire team. In this model, the team scheduled to be on-call are responsible for monitoring their email 24/7; everyone else on the email list has to manually delete the alerts. This creates spam and decreases the sense of the urgency when alerts received.
4. Alerts Slip Through the Cracks
A more sophisticated option involves automating around the alert email address in your monitoring tool. For example, you could set up Google Calendar with the on-call schedule and use a script that polls the calendar. The script would take the email of the on-call staff and update the monitoring tool when there is a change. However, this solution only supports single-level on-call scheduling. It doesn’t allow for escalation scenarios where the first alert is missed by the primary engineer, and the need for the to the secondary on-call teammate to be notified.
5. No Central Source Of On-Call Schedules
Some monitoring tools support on-call scheduling natively via CSV uploads, but with limited flexibility. Often, your choices are limited to daily (as opposed to hourly) rotations or simplistic schedules. They don’t allow for more complex on-call scheduling such as follow-the-sun schedules. Many companies have multiple monitoring tools for their website, server, database, etc. Setting up and managing multiple monitoring tools just for on-call scheduling is a pain.
If you suffer from any of the issues above, you’re in need of a cure. It’s time you turn to an incident management remedy to alleviate your on-call scheduling ailments, and to preserve your mental health. Don’t be shy if you are feeling these discomforts. We have personally experienced these symptoms and that’s why we created the PagerDuty cure.
This is a guest post by Ilan Rabinovitch, Director of Product Management at Datadog. The convergence of rapid feature development, automation, continuous delivery, and the shifting...
Dynamic Notifications are now out in the wild! With our launch today, we give PagerDuty users the power to dynamically adjust how they are notified...
600 Townsend St., #200
San Francisco, CA 94103
905 King Street West, Suite 600
Toronto, ON, M6K 3G9, Canada
1416 NW 46th St., St. 301
Seattle, WA 98107
5 Martin Place
1 Fore St,
London EC2Y 9DT
© 2009 - 2019