Evernote: Remembering Everything with PagerDuty

EVERNOTE-Head-Shot
Alexei Rodriguez, VP of Operations

“PagerDuty makes sure the correct people are always contacted.”

Evernote is an application that lets you take notes, save webpages, and capture photos from your mobile device or computer. When you add content to Evernote, it is automatically synchronized to your other devices and can be shared with an online community. Because Evernote is focused on synchronizing data across devices, it’s imperative that the company limits potential data loss caused by downtime.

Early Evernote Challenges

Evernote’s users are mapped to specific servers across the world. If an outage occurs, it could put the content of up to 200,000 users in danger of being lost. Such a disaster could be irreparable, leading to a mass exodus of users. “It is very important for us to know when a crash happens so we can react appropriately and recover as quickly as possible,” says Alexei Rodriguez, Vice President of Operations at Evernote.

Evernote uses Nagios, Pingdom, and Splunk to monitor its systems and services. Using multiple monitoring systems made it challenging to keep track of alerts and ensure that the right engineers are contacted every time. Each time an incident occurred, one of the monitoring systems would send an email or SMS to the on-call engineer, expecting that the engineer would acknowledge the problem. If the engineer did not respond, the monitoring system would alert the entire team or, worse, not wake up anyone at all. “We had several times when an engineer would not see an email or hear an SMS and then the alert system would contact the entire team in an attempt to wake them up. Such a scenario is painful,” Rodriguez says.

To track who was on-call, Evernote used Google Calendar and manually imported the schedule into each of the monitoring systems. Problems arose any time scheduling changes were made. Rodriguez would have to change the calendar, then manually change the contact information in each of the different monitoring systems.

How Did PagerDuty Help?

For Evernote to be the best place to store memories, it was essential to maintain little downtime. Evernote needed an alerting system that would contact the right engineer without fail. They turned to PagerDuty.

PagerDuty’s breadth of contact methods and monitoring tool integrations allows Evernote to rest confident that all alerts reach the right contact. “PagerDuty gives individual engineers the ability to establish how they would like to be notified,” states Rodriguez. “So if you’re a sound sleeper, you can have PagerDuty call you, or your house, immediately after an incident occurs rather than receiving an email. PagerDuty makes sure the correct people are always contacted.” With PagerDuty, Evernote employees can determine for themselves how they will be contacted— by email, SMS, phone, and iOS or Android push notifications— and at what time intervals they want to be contacted. When something goes down, PagerDuty wakes Evernote up.

PagerDuty also makes scheduling changes easy for Evernote. PagerDuty’s calendar clearly displays employees on call, how they can be reached, and the escalation policies if there is no response from the original on-call engineer.

“PagerDuty has made my life easier because I no longer need to manage on-call rotations … I don’t need to think about who to contact.” – Gerardo López-Fernández, Operations Architect

PagerDuty’s on-call schedule ensures the right engineers respond to alerts, which gives Evernote engineers peace of mind that they will only be contacted when an incident requires their specific action. “With other monitoring systems alone, you could be on the road and get a page and be unable to do anything about it,” Rodriguez says. “The system would then wake up the entire team. PagerDuty’s ability to acknowledge and escalate the alert via mobile device is essential, so these situations don’t hurt us.”

“PagerDuty gets rid of the complexity that came from having multiple diagnostic systems contacting me. With PagerDuty, I do not get spammed with unnecessary alerts, even internationally.” – López-Fernández

PagerDuty lets Evernote use multiple on-call schedules for each team within the company. Evernote takes advantage of this feature by using one on-call schedule for its systems administrators and one for its management team. If an incident occurs that requires action from a manager rather than a sysadmin, PagerDuty makes sure the right person gets contacted. Should an emergency arise, PagerDuty is there to make the lives of Evernote’s engineers less agonizing.