We have some very exciting news for all of our customers who are running mission-critical systems on AWS in the US-East region: we have migrated our primary stack for PagerDuty off of US-East into US-West (Oregon). The migration took place on Tue, June 19. We kicked it off just after 7pm Pacific time and finished at 9:30pm.
Best of all, we ran the migration without any downtime! We will blog about how we designed and ran this no-downtime migration in the next couple of weeks — stay tuned.
The main reason for doing this migration is the fact that a significant percentage of our customers (over 20%) run on AWS in the US-East region. As a result, we would have correlated failures between our systems and our customers’ systems: when Amazon has outages in US-East, many of our customers in the region have issues (resulting in high load in PagerDuty) and we would also lose some of our capacity. Needless to say, this is not a good situation. The move out of East has resolved this correlated failure scenario.
We will continue to maintain a fully redundant hot-backup of the PagerDuty stack on a separate provider, just in case. In the longer term, we are moving to a multi-data center setup and a multi-node data store (Cassandra). More details about the design of our new system will follow.