Animating Incidents with webhooks, Firebase and d3.js
We’re rolling out Webhooks on incidents and it opens up a lot of fun new things. For background, Webhooks let you recieve HTTP callbacks when interesting...
We’re rolling out Webhooks on incidents and it opens up a lot of fun new things. For background, Webhooks let you recieve HTTP callbacks when interesting...
PagerDuty’s June Hack Day was just the other day, and once again our staff presented some very cool projects. This month we highlight the winner...
2 min read
As a member of PagerDuty’s realtime engineering team, a top concern is designing and implementing our systems with high availability and reliability. On May 30,...
There are a ton of creative things that our customers do with the PagerDuty API. Justin Lintz at Chartbeat just released his hack week project...
2 min read
#ChefConf is a three-day annual conference featuring demonstrations, workshops, and keynote presentations on the future of infrastructure automation. It’s designed for users of the Chef...
4 min read
We spend enormous amount of our time on the reliability of PagerDuty and the infrastructure that hosts it. Most of this work is invisible, hidden...
On January 24, 25 and 26, 2013, PagerDuty suffered several outages. The events API, used by our customers to submit monitoring events into PagerDuty from...
You’re a techie working for one of the multitude of startups that rushed to market, where the founders hastily glued a Rails app together with candy-bar wrappers and...
A few weeks ago I had the privilege of speaking at Surge 2012 in Baltimore, MD. The audience were of those whose focus was on better...
TL;DR; We brought our deploy time down from 10 minutes to 50 seconds. When I joined PagerDuty over a year ago, our application consisted of...
8 min read
This is a guest post by Connie Quach, Sr. Product Manager, responsible for the web performance products at Neustar. In today’s competitive environment, website performance...
Monitoring your infrastructure. It can be challenging, but that’s why you have all of the tools in place to make sure you don’t miss a...
At PagerDuty, we usually get a front seat to anything that’s wrong with the internet. Last weekend, a derecho storm took out 7% of AWS...
On Thursday, June 14, starting at 8:44pm Pacific time, PagerDuty suffered a serious outage. The application experienced 30 minutes of downtime, followed by a period...
As some of you know, PagerDuty suffered an outage for a total of 15 minutes this morning. We take the reliability of our systems very...
As a general rule, whatever percentage you think your test coverage is, it isn’t. Whatever amount of the known surface area you’re covering, there’s going...
This is the fourth in a series of posts on increasing overall availability of your service or system. Have you ever gotten paged, and known...
One of our goals this year is to attend more conferences outside of San Francisco, and after the Southern California Linux Expo in Los Angeles,...
2 min read