Standing on the shoulders of giants and stumbling with them – the Amazon AWS outage’s "pain" statistics
Today, at around 1am Pacific Time, Amazon began having major problems with some of their cloud infrastructure: specifically with their EC2, EBS, and RDS offerings. We’d like to share some statistics on the alerts we sent out – via phone or SMS – during the outage.
In Reliability
The ups and downs of Availability
This post is meant as a quick introduction to some concepts of system availability, so that subsequent posts in this series make sense. I’ll go over concepts like availability, SLA, mean time between failure, mean time to recovery, etc.
In Reliability
Pivoting – Fixing The Public Transportation System
Introducing Curated Arial Non-Orbital Navigation System or CANON.
On-Call Best Practices: Part 1
This is Part 1 in a multi-part series dealing with tips for being on-call.
In Alerting, Best Practices & Insights, Operations Performance
Fixing The Back Button: AJAX History And Bookmarks
We’ve added deep linking to the incidents table. The browser will now remember all your interactions with the table as you move throughout your account or recall your bookmarks.
In Reliability
Alerts Overage Warning
We’ve added a feature to notify customers when they approach and go over their alert quota.
PagerDuty & Server Density Integration
We’re pleased to announce that Server Density, a provider of hosted monitoring service, supports out-of-the-box integration with PagerDuty.
Load Balancers need static IPs!
We’ve been hosting PagerDuty on AWS for about the last year. One of the biggest draws to the platform for us was the promise of ready-built components…
In Reliability
3 Major New Features – Part 3: PagerDuty & Cloudkick Partnership
We’re pleased to announce that Cloudkick is the first monitoring tool to include out-of-the-box integration capability with PagerDuty.
In Announcements, Features, Partnerships, Product
3 Major New Features – Part 2: The Nagios -> PagerDuty API
Announcing Nagios plugin for PagerDuty.
In Announcements, Features, Product
3 Major New Features – Part 1: Integration API
We are launching three new features: Integration API, Nagios Plugin and Cloudkick integration.
In Announcements, Features, Product
PagerDuty 2.0
We’re happy to announce that we’ve released the new version of PagerDuty with multi-incident support.
Preview release of the new "multi-incident" version of PagerDuty
Preview of multi-incident support in PagerDuty.