Posts by John Laban

Alerting

On-Call Best Practices: Page Your Manager

Having one person on-call isn't enough. What happens if your on-call engineer sleeps through their alert? What happens if their phone's battery dies ...

Reliability

Outage Post Mortem – June 3rd & 4th, 2014

On June 3rd and 4th, PagerDuty’s Notification Pipeline suffered two large SEV-1 outages. On the 3rd, the outage resulted in a period of poor ...

Alerting

Approaching the Hiring of Engineers as a Machine Learning Problem

Hiring software engineers is hard.  We all know this.  If you get past the problem of sourcing and landing good candidates (which is hard in...

Reliability

Outage Post Mortem – March 15

As some of you know, PagerDuty suffered an outage for a total of 15 minutes this morning. We take the reliability of our systems very...

Reliability

Pressure Release Valves

This is the fourth in a series of posts on increasing overall availability of your service or system. Have you ever gotten paged, and known...

Reliability

A Standard Operating Procedure for when s*IT hits the fan

This is the third in a series of posts on increasing overall availability of your service or system. In the first post of this series, we...

Reliability

More control over Optimistic Locking in Rails

Like pretty much everything else in Rails, optimistic locking is nice and easy to setup:  you simply add a “lock_version” column to your ...

Reliability

Availability lessons from shoe companies and ancient warlords

This is the second in a series of posts on increasing overall availability of your service or system. In the first post of this series...

Features

Getting the most out of PagerDuty: Incident De-Duping

Tired of getting a flood of PagerDuty incidents whenever a problem occurs with one of your systems?  Do many of the incidents seem identical?  Do...

Events

Velocity Contest Winners

Velocity 2011 was a blast! Thanks to everyone who came by our booth to find more about PagerDuty, snag a t-shirt, and enter our contest...

Announcements

New APIs Available Now

Have you ever said to yourself: “PagerDuty is great, but I wish I could better integrate it into the custom tools I already use.” Or...

Events

See you at Velocity 2011

PagerDuty is excited to be attending the O’Reilly Velocity Conference 2011 next week in Santa Clara, CA. Velocity is a great venue that focuses on...

Announcements

PagerDuty Wants You!

We’re hiring! Interested in working with a team reinventing the stagnant world of IT operations software? Want a job hacking on a product with a...

Reliability

Standing on the shoulders of giants and stumbling with them – the Amazon AWS outage’s "pain" statistics

Today, at around 1am Pacific Time, Amazon began having major problems with some of their cloud infrastructure: specifically with their EC2, EBS, and ...

Reliability

The ups and downs of Availability

This post is meant as a quick introduction to some concepts of system availability, so that subsequent posts in this series make sense. I'll go over ...

Alerting

On-Call Best Practices: Part 1

This is Part 1 in a multi-part series dealing with tips for being on-call...