Founded in 2010, Pantheon is the website management platform for Drupal and WordPress. More than just hosting, Pantheon’s platform includes all the tools professional developers need to build best-of-breed websites — like staging environments, version control, backups and workflows.
Nick Stielau is the Director of Engineering at Pantheon and is responsible for organizing the engineering team, providing support around planning, delivering new products and features, and maintaining existing infrastructure and supporting existing functionality. As Pantheon continues to grow their customer base, implementing an incident management solution was critical to helping them manage their on-call resourcing and meet their high uptime SLA expectations.
Replacing their previous alerting tool with a more scalable solution
Since Pantheon implemented PagerDuty, they haven’t experienced challenges or obstacles in supporting their incident management resources. Prior to leveraging PagerDuty, they had implemented a custom-built alerting tool that didn’t scale to meet their needs as they grew their engineering and customer success teams. When issues arose, the team wanted to ensure that they had the right tools and systems in place to be able to respond to incidents 24/7, and needed a solution that was both highly reliable and that would grow with them as they scaled.
Serving customers 24/7 and surpassing uptime expectations
“It was extremely nice to have PagerDuty from the start. We didn’t have to deal with the main problem areas that most companies do when managing incidents. It was one of the solutions that helped us create an operationally like-minded team,” said Nick Stielau. Pantheon is currently using PagerDuty for alert management, on-call automation with scheduling and escalations, real-time response orchestration, as well as reporting on system-level and operational efficiency metrics. PagerDuty has helped Pantheon operationalize and improve efficiency and collaboration across departments, especially within the engineering and customer success teams. The on-call engineers are responsible for handling and triaging alerts that come from their infrastructure monitoring stacks. Meanwhile, the customer success team is on-call for customer-based tickets and calls, and manages real-time customer communication and outage updates through their status page. Implementing PagerDuty has also allowed the company to provide a functional and positive feedback loop for both customers and their teams. “PagerDuty gives us the ability to serve our global customers 24/7 across both infrastructure and customer-facing issues,” said Stielau.
PagerDuty has hundreds of self-service integrations and extensions with monitoring, ticketing, deployment, and collaboration tools, so that customers can easily customize the ideal incident resolution workflow for any environment. Pantheon in particular utilizes PagerDuty’s integrations with Slack and Sensu. With Slack, on-call engineers and support staff can immediately get notified on, acknowledge, respond to, and collaborate on incidents directly within Slack without having to toggle between tools, as well as tag the appropriate teams for additional help. Pantheon also integrated Sensu with PagerDuty to aggregate their customer support requests. PagerDuty enables the ideal real-time response orchestration, by automatically routing the issue to the right person depending on the service importance and severity of the incident, and escalating the issue to the next line of defense if it isn’t acted on.
The PagerDuty platform helps Pantheon minimize time spent on administrative tasks, and instead frees up teams to direct their focus and energy to resolving issues effectively and innovating solutions. PagerDuty makes it possible for them to continue serving their customers 24/7. “One of our top level business KPI’s is site uptime. PagerDuty is a critical part of the system and processes which help us keep that uptime where we want it to be, resulting in exceeding our 99.9% uptime SLA,” said Stielau.
Meeting their commitment to uptime and performance
“A big value Pantheon provides is committing to our customers’ success on a daily basis. PagerDuty helps us meet our commitments to uptime and performance,” said Stielau. Without having PagerDuty to support their incident prevention and resolution process, it would be difficult for the company to serve their customers efficiently and it would add frustration to those responsible for the product and customer experience. PagerDuty helps relieve the stress associated with being on-call: “there is literally always someone you can escalate an incident to. If you’re really indisposed or need the help, PagerDuty helps codify that support,” stated Stielau.
“One of our top-level business KPI’s is site uptime. PagerDuty is a critical part of the system and processes which help us keep that uptime where we want it to be, resulting in us exceeding our 99.9% uptime SLA.”