Turn any signal into insight and action. See how PagerDuty Digital Operations Management Platform integrates machine data and human intelligence to improve visibility and agility across organizations.
Connect insights to real-time action by aligning teams through the shared language of business impact.
Check out the latest products we’ve been working on—including event intelligence, machine learning, response automation, on-call, analytics, operations health management, integrations, and more.
Digital Operations Management arms organizations with the insights needed to turn data into opportunity across every operational use case, from DevOps, ITOps, Security, Support, and beyond.
Over 300 Integrations
Discover DevOps best practices with our library of webinars, whitepapers, reports, and much more.
Learn best practices and get support help with resources from our award-winning support team.
See how PagerDuty works with our live product demo — twice a week, every week.
Join live and on-demand webinars for product deep dives, industry trends, configuration training, and use case-specific best practices.
Interactive, simple-to-use API and technical documentation enables users to easily try updates and extend PagerDuty.
Engage with users and PagerDuty experts from our global community of 200k+ users. Become a member, connect, and share insights for success.
Get all your PagerDuty-related questions answered by exploring our in-depth support documentation and community forums.
“I need to be notified if there’s a significant event ongoing with SignalFx.” This is what I tell my team. However, despite being the CTO...
PagerDuty helps organizations transform their digital operations. Learn more about PagerDuty's mission and what we do.
Meet our experienced and passionate executive team.
We are risk-taking innovators dedicated to delivering amazing products and delighting customers. Join us and do the best work of your career.
With the PagerDuty Foundation, we are committed to doing our part in giving back to the community.
StumbleUpon makes it easy to discover new and interesting pages from all corners of the Internet. Over 30 million people use the service to discover a vast collection of curated content that is personalized to their specific interests. With so many users accessing the service daily, StumbleUpon needs a reliable alerting system to minimize downtime.
To keep attracting users and advertisers, StumbleUpon must be able to ensure that the website is always available. The company uses Nagios and Pingdom for monitoring, but both of those systems lack reliable alerting. Whenever a server failed at StumbleUpon, one of the alerting systems would send an email or SMS alert to reach the on-call engineer. These alerts were easy to miss if the engineer was on the road or asleep, and it sometimes led to website outages. For StumbleUpon, downtime can have serious financial costs. “We are an ad-supported company,” said Michael Hobbs, Operations Manager at StumbleUpon, “so anytime a user can’t get to our website we aren’t fulfilling our advertising content. It can be expensive.”
Through proactive alerting, StumbleUpon tried to reduce downtime before it reached the user. However, the company had no way to track alerts across the different systems. This made it difficult to spot weak areas in its IT infrastructure.
The on-call schedule had its own issues as well. StumbleUpon was using a manually maintained system to keep track of on-call engineers. The schedule became a stressful mess that was difficult to manage. When a substitution needed to be made, a manager had to manually input the contact information changes from the schedule to the different monitoring systems every time. This laborious process was ripe for mistakes and consumed far too much of the managers’ time. “Previously I have used a Google Calendar or a wiki, and had to update email addresses in the monitoring systems,” Hobbs said. “It was very painful.”
StumbleUpon turned to PagerDuty for a solution to these problems. PagerDuty’s wide breadth of notification methods helped improve the mean time to response. With PagerDuty, engineers can be contacted by SMS, email, phone calls to multiple numbers, and iOS or Android push notifications. Each user can decide how they will be alerted and at what time intervals. There are also clear escalation policies if the first person is unreachable, so that every alert gets a response. With PagerDuty, engineers can customize their own notification preferences. “We had one guy who was a really heavy sleeper,” said Hobbs. “He couldn’t find an SMS sound that was loud enough to wake him so he would have PagerDuty call him four times to make sure he wouldn’t miss any alerts.”
“PagerDuty just makes things easier.”
StumbleUpon uses PagerDuty’s iOS app to contact engineers about in-depth issues when they are on the road or without WiFi. “I really like the iOS app,” Hobbs said. “The ability to acknowledge, escalate, and resolve all incidents while away from the office is really nice.”
“PagerDuty is one of those companies that does what it does without any flaws.”
PagerDuty provides a sophisticated, proactive alerting system. “It’s amazing that PagerDuty keeps track of the root cause of an incident and has a central place collecting data from all our monitoring systems,” said Hobbs. This data gives StumbleUpon’s engineers the information they need to spot recurring trends and prevent downtime.
StumbleUpon also easily integrated its homegrown monitoring systems directly into PagerDuty. “If someone spots a problem that one of our monitoring systems didn’t pick up, they can shoot a message to our emergency email address and it instantly jumps into PagerDuty so we never miss anything,” said Hobbs.
“PagerDuty makes it so we don’t have to worry about scheduling and we can focus on other aspects of our work.”
On-call schedule changes are now a breeze for StumbleUpon’s managers. Managers can quickly and easily update changes in PagerDuty, which automatically adjusts the schedule. PagerDuty’s calendar clearly displays who is on-call, how they can be reached, and the escalation policies that will be used if the original on-call engineer is unresponsive.
PagerDuty’s escalation policies allow different types of incidents to be sent to StumbleUpon’s on-call teams: DevOps, operations, and general engineers. When an incident occurs, an alert no longer needs to be sent to one person who must manually escalate it to the right engineer. Instead, the alert is sent directly to the correct person, reducing mean time to resolution and relieving StumbleUpon’s fear of missed alerts.
600 Townsend St., #200
San Francisco, CA 94103
905 King Street West, Suite 600
Toronto, ON, M6K 3G9, Canada
1416 NW 46th St., St. 301
Seattle, WA 98107
5 Martin Place
1 Fore St,
London EC2Y 9DT
© 2009 - 2018