Evernote is a cross-platform software-as-a-service application designed to help people be more productive by making it easier to take notes and manage information across web and mobile devices at all hours of the day.
Today, Evernote has over 220 million users across the globe, with 80 percent of them outside of the United States. With so many people dependent on the platform, Evernote must ensure high service availability—or risk having unhappy customers and subscription cancellations. PagerDuty enables Evernote’s engineers to respond quickly to minimize customer impact of performance issues.
Understanding the Customer Journey Through Service-Level Objectives
Garrett Plasky, Evernote’s SRE Manager, leads a team comprised of site reliability engineers, devops engineers, and system administrators who are responsible for the health of Evernote’s production service infrastructure—and ultimately, customer happiness.
“In 2016, Evernote began a major evolution of its hosting infrastructure,” Plasky shared. “The update—which centered around a migration of many workloads to Google Cloud Platform—was part of an effort to democratize operations and enable engineers to move quickly, iterate, and build services.”
However, with increased agility came more responsibility. Evernote engineers were now responsible not just for building services, but also for maintaining them in production. To do this effectively, they needed to track key performance indicators (KPIs), which could help them make informed decisions about how to maintain service-level objectives (SLOs) when a problem occurred with the infrastructure.
“These are the types of things that we’re monitoring and alerting for more—the full user journey, aka the things our users care about,” Plasky explained. “For instance, how long does it take you to open, create, and sync a note? We’re reframing the way we think about what’s important and looking at things more from the top of the funnel down instead of from the bottom up.”
Developing Insights to Empower Engineers and Improve Future Response
Looking at SLOs through the lens of the customer has also provided Plasky’s team with insights to make informed, real-time decisions about complex application environments. Evernote engineers are responsible for maintaining services that they create and have the authority to determine whether a given alert is serious enough to merit action. PagerDuty provides the data necessary to help Plasky’s team make decisions about the relevance of each incident, empowering engineers to work more effectively while still maintaining high service availability for end users.
Additionally, using PagerDuty’s postmortem capabilities also enables Plasky and his colleagues to perform insightful, streamlined postmortems. “One challenge that we have as an Operations organization is to continue our mature and well-rounded incident response process, but also balance that with the fact that we don’t want to spend two man-days putting together a postmortem report or have a three-hour meeting discussing an issue.” By automating postmortem reporting, PagerDuty helps the team meet this challenge.
Evernote and PagerDuty: Growing Together
As Evernote continues to grow and evolve, PagerDuty will be right by its side. When Plasky joined Evernote in 2012, the company was using PagerDuty only for alerting and notifications. Today, his team also uses PagerDuty for scheduling on-call rotations and is taking advantage of the platform’s advanced analytics capabilities to give them a single source of truth for visibility into production issues.
Evernote plans to increase its use of microservices over the next year, and the company will be adding more product engineering teams as PagerDuty users so they can be responsible for running their own service before handing it over to Plasky’s team. The additional PagerDuty features and integrations also figure prominently into future plans—particularly the available postmortem templates and response plays, so Evernote can continue to automate and improve its incident response process.
“We have different sources of data and alerting. But having them all funnel through PagerDuty has value because it makes it easy for us to see what happened, what went wrong and when,” Plasky shared. “PagerDuty is what wakes us up when something critical breaks, which is essential to keeping customers happy.”
“We have different sources of data and alerting. But having them all funnel through PagerDuty has value because it makes it easy for us to see what happened, what went wrong and when.”