Turn any signal into insight and action. See how PagerDuty Digital Operations Management Platform integrates machine data and human intelligence to improve visibility and agility across organizations.
Check out the latest capabilities we released.
Flexible schedules, escalations, & alerting
Automated, best practice incident response
Powerful context & noise reduction at scale
Quantify real-time business & technical impact
Improve with modern, prescriptive insights
Over 300 Integrations
Discover DevOps best practices with our library of webinars, whitepapers, reports, and much more.
Learn best practices and get support help with resources from our award-winning support team.
See how PagerDuty works with our live product demo — twice a week, every week.
We've created a maturity model to assist on the journey to digital operations excellence. Take our short assessment to find out where your team falls!
Interactive, simple-to-use API and technical documentation enables users to easily try updates and extend PagerDuty.
Engage with users and PagerDuty experts from our global community of 200k+ users. Become a member, connect, and share insights for success.
Get all your PagerDuty-related questions answered by exploring our in-depth support documentation and community forums.
In part 2 of our postmortem series, we dig into how to establish a culture of continuous learning, from getting leadership on board to invoking...
PagerDuty helps organizations transform their digital operations. Learn more about PagerDuty's mission and what we do.
Meet our experienced and passionate executive team.
We are risk-taking innovators dedicated to delivering amazing products and delighting customers. Join us and do the best work of your career.
With the PagerDuty Foundation, we are committed to doing our part in giving back to the community.
“You code it, you own it” means engineers are called when the software and systems they’ve built fail in production and it’s their responsibility to get everything working again. However, managers and business stakeholders aren’t usually on-call so they don’t see or feel the pain of being paged. This can lead to work prioritization decisions that lack empathy and fail to take into account the responsibility we all have for operational resiliency. Managers push for delivery of new features and higher output over work that addresses operational pain. The engineers see problems and feel powerless to solve them. Over time this conflict results in expensive outages that hurt the team, the business, and customers.
Small issues are usually an early warning sign of more serious problems. If they’re fixed as soon as they arise, bigger problems can be avoided in the long run and your team and customers stay happy.
So, how do we get proactive and make fixing operational problems a habit? Empowering the team with effective on-call handoff sessions is a great place to start!
When our on-call team members go off duty and hand the baton to their teammates, we use this time to expose operational problems, discuss solutions, and empower the team to initiate action. Here are a few tips for effective on-call handoff sessions based on my experience of being on-call at a number of companies, including PagerDuty.
It’s easy to miss problems engineers are facing when they’re on-call if the team only talks about operational problems in engineering chat rooms. We have regular, dedicated handoff sessions to encourage reflection and create a bias for proactive action to address root cause. Our schedules usually change once a week so the meeting coincides with the day of the changeover.
Being on-call and waking up to incidents can be disruptive and stressful. We include other stakeholders in the on-call handoff meeting to build a sense of camaraderie and empathy, which ultimately leads to better decision making across the organization.
Our product managers benefit from understanding the impact of operational pain on engineers and customers. Exposure to hand-off sessions allows PMs to hear the impact of their prioritization decisions and ensure both product and technical initiatives are moved forward during work planning sessions.
The goal of engineering leaders is to foster a team culture where individuals are happy, motivated, creative and engaged. By observing on-call handoff sessions and carefully listening to concerns, people managers get exposure to insights that may not be uncovered in team/one-on-one meetings. Following the session, leaders can take action to provide support and resources. Encouraging engineers to take well-deserved time off or helping prioritize the team’s technical/operational recommendations are two examples.
It’s easy for teams to get accustomed to disruption when it builds up gradually over time; especially if no one is taking a holistic view and noticing worrying patterns. By reviewing metrics during the handoff session, a culture of observability is promoted that allows the team to see the true picture of operational health — both infrastructural health and human health.
Here are metrics and tools we’ve found useful during our handoff sessions:
Team disruption statistics: PagerDuty provides valuable data and graphs showing total incidents by service, team, and user. Comparing counts at each review allows us to reflect on patterns and discuss solutions.
Chat history: By using chat integration (Slack, Hipchat etc.), all incident notifications can be sent to a dedicated channel. Our engineers chat in the same channel as the incident notifications so it’s easy to identify and analyze conversation threads showing trending topics and concerns.
Use PagerDuty’s Public APIs to create custom reports and apps: Using PagerDuty’s APIs supports the creation of reports and apps that can be tailored to your business. For example, we’ve created an extension that gives an instant picture of how much out-of-hours disruption the on-call team members have had based on the time of day and frequency of high-priority incidents. By sharing this view across the team in the handoff session, we see a picture of team health that motivates us to take action.
Areas of concern that are uncovered during the on-call hand-off sessions must be followed up with concrete actions. PagerDuty’s Jira integration makes it easy to quickly track unplanned work from right inside an incident. It’s then just a short step to assign this work to the on-call engineer (see next section “Reinforce expectations for on-call duties” to understand how this works).
If improvements are noted and correlated back to concrete actions, it’s much more likely those improvements will happen.
Remember to review the result of changes in subsequent on-call handover sessions and adjust your approach based on what was learned.
Many teams fall into the trap of failing to set clear expectations of on-call and see it as just ‘part of the job’ rather than a dedicated, critical role. How can you stay out of this trap? We set clear expectations:
At the on-call handover session, it’s important to check in on these expectations and reinforce the message: Operational improvement requires effort: humans need time and space to be able to focus on it. They also need downtime and a workload that is sustainable.
For more advice on best practice for being on-call, check out our On-Call Survival Guide.
Having engineers on-call is an effective way to encourage continuous improvement and system stability. However, it only works if everyone in the organization understands how to play their part in making it successful. Even if you are not an engineer, your decisions are likely to have unintended side effects on the well-being of engineers and the systems they’re building. Getting involved in on-call handoff sessions and encouraging proactive resolution of problems leads to happy teams and successful products. I encourage you to look at your own organization and reflect on ways you can build empathy across teams using similar techniques. Share your ideas and suggestions in our Community forum!
This blog was co-authored by myself and Simon Darken. Once a year, PagerDuty’s SREs get together for a three-day, in-person offsite. With the team spread...
At the latest PagerDuty Connect event in Toronto, DevOps expert Arthur Maltson shared a recent story about chaperoning his daughter’s school field trip to a...
600 Townsend St., #200
San Francisco, CA 94103
905 King Street West, Suite 600
Toronto, ON, M6K 3G9, Canada
1416 NW 46th St., St. 301
Seattle, WA 98107
5 Martin Place
1 Fore St,
London EC2Y 9DT
© 2009 - 2019