What monitoring tools do you use?
We support any monitoring tool that can send an email or make a JSON call, but we support tighter integration with some than others. We...
We support any monitoring tool that can send an email or make a JSON call, but we support tighter integration with some than others. We...
This is the third in a series of posts on increasing overall availability of your service or system. In the first post of this series, we...
Like pretty much everything else in Rails, optimistic locking is nice and easy to setup: you simply add a “lock_version” column to your ActiveRecord model...
This is the second in a series of posts on increasing overall availability of your service or system. In the first post of this series,...
PagerDuty is thrilled to be a sponsor for PuppetConf 2011. PuppetConf is a DevOps and Operations conference presented by Puppet Labs in beautiful Portland, OR...
1 min read
As you may already know, PagerDuty suffered an outage of 30 minutes yesterday, followed by a period of increased alert delivery times. We’re taking the downtime...
Updated on 9/21: We have replaced Twitter with our status page as a communication method. At PagerDuty we strive for 100% uptime, and it is a...
On August 8 – 10, we’ll be “staying classy” in San Diego, California as we attend HostingCon 2011. HostingCon is the premier conference and tradeshow...
2 min read
Velocity 2011 was a blast! Thanks to everyone who came by our booth to find more about PagerDuty, snag a t-shirt, and enter our contest.
1 min read
PagerDuty is excited to be attending the O’Reilly Velocity Conference 2011 next week in Santa Clara, CA. Velocity is a great venue that focuses on...
1 min read
PagerDuty is hosting the June meet-up for the San Francisco Perl Mongers Meetup. Gaëtan Voyer-Perraul from MongoDB will be presenting, "Perl + MongoDB => Mongoers + Fun".
1 min read
Today, at around 1am Pacific Time, Amazon began having major problems with some of their cloud infrastructure: specifically with their EC2, EBS, and RDS offerings. We'd like to share some statistics on the alerts we sent out - via phone or SMS - during the outage.
This post is meant as a quick introduction to some concepts of system availability, so that subsequent posts in this series make sense. I'll go over concepts like availability, SLA, mean time between failure, mean time to recovery, etc.
We’ve been hosting PagerDuty on AWS for about the last year. One of the biggest draws to the platform for us was the promise of ready-built components...