As every business evolves into a digital business, and micro-moments matter more than ever to both revenue and brand equity, the need for effective communications in the face of crisis has grown. There is, of course, more pressure on the technical teams who develop and manage these services, but digital transformation is also putting pressure […]

When a customer outage occurs, its impact is felt across the organization. While the technical response is underway, stakeholders from public relations, customer support, legal, and executives must also all be engaged and kept informed.But as teams become more global and distributed, coordinating streamlined internal and external communications and response only gets harder. You need […]

| In Best Practices & Insights, DevOps, Tech Talk, Technology

Monitoring is pivotal in the sustained proactivity in your ITOps architecture. In recent years, we have seen an explosion in both the number of and types of tools classified as “monitoring” tools. While this ever-increasing tools landscape has vastly increased ITOps visibility, the occasional side effect of integrating this vast array of tools is to […]

Having one person on-call isn’t enough. What happens if your on-call engineer sleeps through their alert? What happens if their phone’s battery dies without them knowing, or if they get an alert at a really inconvenient time, like when stuck on a bus or in traffic? It will happen. We present best practices for back up. One or more people, waiting in the wings, ready to spring into action if your primary on-call is unable to perform his or her duties to the best of their abilities at any given time.

| In DevOps, Redirect

Since we launched on-call handoff notifications, lots of our customers have used them to be notified about their on-call responsibilities to make sure they never forget when they’re on-call. Over the years, we’ve seen a variety of on-call schedules and thought we’d share some of the more favored practices we’ve seen. Exchange Shifts During Business […]

| In Alerting, Operations Performance

Whenever we meet someone the first question we are asked is what we do for a living. We are always on the job, even though we try our hardest not to be. While this can cause stress or worry, it also creates a sense of ownership over our responsibilities. None of us wants someone else […]

| In Alerting, Operations Performance

It’s easy to feel underutilized as an engineer working in a NOC. Especially in a larger organizations you may find yourself silod into owning highly specific responsibilities. At PagerDuty, we don’t believe that any engineer should sit around, wasting time, watching lines on graphs move up and down. You’re too smart to waste your talents. […]

Anything can happen while you’re on-call. You can experience a quiet, incident-free shift or suffer a severe outage that makes your head explode. Since you don’t know what you’ll get, you always have to be prepared for anything. Being on-call is stressful enough as is so we strive make it less painful with easy scheduling […]

| In Alerting, Operations Performance

Last week, we gave some suggestions for how you can spend your time when you are on-call. However, here are some things that you absolutely should not do while on-call. Don’t Run Away from Civilization and Become Feral Working on screens all day can occasionally make you want to run into the woods and never […]

| In Alerting, Operations Performance

In a recent survey we conducted of on-call engineers, 51.5% of people stated that while on-call during non-business hours they like to spend time with their friends and family. But an alarming 36% indicated that they feel they are stuck at home doing nothing, just incase something breaks! Being on-call no longer means you need […]

| In Features

Long gone are the days of emails being primarily used to catch up with friends and forward those annoying chain letters so you aren’t cursed with bad luck for seven years. Today, email is our new “official” communication. It is the home for our bank statements, legal advisor notices, password retrievals, subscriptions to things that […]

| In Alerting, Operations Performance

The On-Call Scheduling Best Practices Series is back! In the first on-call best practices series, we covered what equipment is needed and how people want to be alerted while on-call. In the second part of the series, we covered who should be part of an on-call escalation policy. The best way to deal with any […]

| In Reliability

As a general rule, whatever percentage you think your test coverage is, it isn’t. Whatever amount of the known surface area you’re covering, there’s going to be an exciting swath of things you didn’t realize that you need to test. Analytics fell into that bucket for us. We use Google Analytics in our webapp to […]

| In Features

Tired of getting a flood of PagerDuty incidents whenever a problem occurs with one of your systems?  Do many of the incidents seem identical?  Do you spend valuable time trying to fend off the seemingly never-ending PagerDuty phone calls and SMS messages while you should be fixing the actual problem?  Then you, my friend, might […]

This is Part 1 in a multi-part series dealing with tips for being on-call.