Turn any signal into insight and action. See how PagerDuty Digital Operations Management Platform integrates machine data and human intelligence to improve visibility and agility across organizations.
Check out the latest features we've been working on — from event intelligence, machine learning, response automation, on-call, analytics, integrations, and more.
Digital Operations Management arms organizations with the insights needed to turn data into opportunity across every operational use case, from DevOps, ITOps, Security, Support, and beyond.
Over 200 Integrations
Discover DevOps best practices with our library of webinars, whitepapers, reports, and much more.
Learn best practices and get support help with resources from our award-winning support team.
See how PagerDuty works with our live product demo — twice a week, every week.
Join live and on-demand webinars for product deep dives, industry trends, configuration training, and use case-specific best practices.
Interactive, simple-to-use API and technical documentation enables users to easily try updates and extend PagerDuty.
Engage with users and PagerDuty experts from our global community of 200k+ users. Become a member, connect, and share insights for success.
Get all your PagerDuty-related questions answered by exploring our in-depth support documentation and community forums.
What do you do after you’ve experienced an incident and performed a post-mortem (or, postmortem)? That may seem like a simple question, or even a non-question; after all, it’s easy to think of the post-mortem as ...
PagerDuty helps organizations transform their digital operations. Learn more about PagerDuty's mission and what we do.
Meet our experienced and passionate executive team.
We are risk-taking innovators dedicated to delivering amazing products and delighting customers. Join us and do the best work of your career.
With the PagerDuty Foundation, we are committed to doing our part in giving back to the community.
SendGrid is a proven cloud-based customer communication platform that successfully delivers over 25 billion emails each month for Internet and mobile-based customers. The company is headquartered in Colorado with over 300 employees, 23 of those within the operations team and approximately 84 in the development group. Mary Moore-Simmons, Engineering Operations Manager, is in charge of managing the infrastructure at SendGrid, which includes servers and data centers, the network behind it all, virtualization stacks, and backend systems. With the high rate of emails that are sent from SendGrid, there are a multitude of incident alerts generated on a daily basis. Finding a scalable enterprise-grade solution to help streamline and simplify the manual incident alert process was a top initiative for the company.
Replacing previous alerting tool and overcoming scalability challenges
SendGrid receives up to two thousand incident alerts in a typical day and tens of thousands per minute during technical incidents or outages. With such a large amount, it’s important for the company to address alerts quickly and efficiently. Before making the move to PagerDuty, SendGrid used a different vendor for alerting, but realized they needed a full-scale incident management solution in place to support their high volume of incidents. “When you have a tool in place, you want it to work, especially when there is an outage; that’s when you expect it to work,” said Moore-Simmons. Faced with scalability challenges, SendGrid decided to make the move to a reliable and scalable incident management solution.
Accelerating MTTA and MTTR by switching to a new incident management platform
SendGrid implemented PagerDuty as their new incident management solution and uses the platform for collaboration, scheduling, escalation, and reporting. When on-call, a user is able to acknowledge an incident alert, escalate the alert if needed, or resolve the issue at hand, allowing them to move directly to the next incident without any delay. The main dashboard which reports all incidents is another critical benefit for SendGrid. “The way PagerDuty’s incident management dashboard’s UI is designed allows you to see what’s going on and what kind of alerts you are receiving. This is super helpful for us – no more having a list of alerts moving around at all times and losing focus on them,” said Moore-Simmons.
Moore-Simmons finds PagerDuty’s reporting feature to be the most important asset for her role. Reporting on metrics enables her to gather insight around the number of alerts per day, per week, per month, and per year. “We had an estimate of 78,000 alerts happen this year and the company’s goal was to reduce the number of alerts by 50% compared to 2015. So far, we are on track with this metric, thanks to the support of PagerDuty,” stated Moore-Simmons. She was also able to figure out that the team’s average mean-time-to-repair (MTTR) is 19 minutes, while the average mean-time-to-acknowledge (MTTA) is only 2 minutes. Gathering this type of information helps both Moore-Simmons and the other engineering managers identify what’s working, what’s not, and how to fix the problem.
The biggest benefit to SendGrid was that their operations and development teams could now resolve outages quickly and prevent them from happening again, thanks to the reliable and rapid incident notifications. Every minute that an outage occurs costs the company thousands of dollars and results in poor customer experience and customer churn, and with fewer outages, there has been less customer churn. Moreover, the team is now more satisfied and productive after switching to PagerDuty.
Enhancing employee productivity and improving scalability
SendGrid can rely on PagerDuty as a trustworthy solution to support their use cases, critical alerts, and scheduling. “We have confidence in PagerDuty and no longer have to worry about unnecessarily long outages and revenue loss. Everyone on-call at SendGrid uses PagerDuty and knows the solution as an established provider,” said Moore-Simmons. Employees are happy and productive which is important to the business. Overall, the company has seen many advantages after switching to PagerDuty, including faster resolution times for outages, increased employee productivity and happiness, as well as pulling impressive bottom-of-the-line metrics that attest to the company’s operational efficiency.
“PagerDuty helps us respond faster to the alerts that we receive. We’re able to diagnose outages faster, which in turn improves the experience of our customers and reduces downtime as well as any associated costs”