Customers are loyal to companies with whom they feel a shared set of values. So when an unexpected event strikes a company, the resulting upheaval places the brand at risk. Whether it’s an airlines blunder or a performance incident involving technology systems, these moments affect the customer experience by taking down your ability to serve […]

Recently, I was putting together training material for our upcoming track on “Owning Incident Response” at PagerDuty University, and I listened to the recordings of incident calls across many years of PagerDuty history. Several hours of hearing my coworkers at 2x speed prompted two observations: first, I should go find my copy of Christmas with the […]

| In DevOps, PagerDuty Life, Tech Talk

On June 28th, 2017, we marked four years of performing “Failure Fridays” at PagerDuty.  As a quick recap, Failure Fridays are a practice we conduct weekly at PagerDuty to inject faults into our production environment in a controlled way, and without customer impact. They’ve been foundational for us to verify our resiliency engineering efforts. Over […]

Credit: NASA Organizations need many incident commanders to provide a high level of service to their customers while avoiding on-call load. Many shy away from becoming an incident commander because they assume only senior technical leads can be one. However, soft skills are actually more important, and with a well-defined process like the one outlined […]

Today, we’re excited to announce a suite of new functionality to power even faster resolution and accelerate learning from major business-impacting incidents with the definitive Incident Resolution Lifecycle. With this release, we help you to differentiate major incidents from other day-to-day operational issues, and easily adopt best practices to streamline incident resolution and learning in […]

| In DevOps, Response Orchestration

Incident response (IR) is a process used by ITOps, DevOps, and dev teams to address and manage any sort of major incident that may arise. The main goal of IT incident response is to organize an approach that limits damage and reduces recovery time and costs — and prevents it from happening again. Incident response […]

| In Collaboration, On-Call Life

In today’s integrated digital economy, the IT infrastructures at most corporations can no longer exist in silos. The overwhelming benefit of integration is the rapid development of new ideas and solutions. The unfortunate downside is that increased integration and connectivity also places our respective organizations at risk for cyber attacks, computer viruses, and infrastructure problems […]

| In ITOps, Response Orchestration

Incident response bottlenecks – you know they’re real and you know that your incident response system probably has a few, but they must be minimized as they hurt your on-call teams and your customers. Let’s take a look at some of the most critical bottlenecks and how to avoid them. What Are Your Goals? First, before you […]

| In Product, Response Orchestration

| In Alerting, Operations Performance

New Zealand is located on the southern tier of the Pacific “Ring of Fire”, which makes it no stranger to seismic activity. On average, there are about 10 earthquakes per day that are felt by people in New Zealand! Most of the earthquakes experienced by Kiwis are small – 4.0 or less on the Richter […]