Customer Perspective: Setting Up IT Operations Software for Startups

- June 2, 2015

This is a guest blog post written by Anthony Gibbons, the Operations Manager at Airhead Education. Anthony gives his perspective as a startup setting up PagerDuty as their IT Operations Software: “With the advent of cloud services and companies willing to integrate with each other, it is now entirely possible for a small startup to use the same monitoring tools as industry stars such as Airbnb, Pinterest and Path… It probably took me an hour to integrate all of my services with PagerDuty.”

Share on FacebookTweet about this on TwitterGoogle+

CloudMonix and PagerDuty Join Hands for Next-Gen Cloud Monitoring

- May 19, 2015

With CloudMonix’s core objective of simplifying, streamlining and automating routine or complex tasks for Cloud System Administrators and IT Professionals – we are always on look to improve the way we deliver our services. That’s why we have partnered up with PagerDuty, to deliver instant alerts and notifications on PagerDuty’s leading Incident Management platform.

Share on FacebookTweet about this on TwitterGoogle+

Gain Greater Context with Rich Incidents

- May 13, 2015

The site is down. Alarms are going off. Before you can fix anything, you first have to understand what’s going on. And gaining context can be hard as you look across multiple systems and metrics. We’re pleased to announce Rich Incidents, a new feature for PagerDuty that helps incident responders gain additional context. Now, responders can go straight from an alert to a conference bridge, chat room, or runbook, giving them instantaneous access to each other and to any documentation they might need. Additionally, embedded graphs give more context into an incident, helping you respond faster and maintain a dependable product for your customers.

Share on FacebookTweet about this on TwitterGoogle+

Subscribe to Our Blog

Get interesting content and product updates on the regular.

The Discovery of Apache ZooKeeper’s Poison Packet

- May 7, 2015

zookeeper

ZooKeeper, for those who are unaware, is a well-known open source project which enables highly reliable distributed coordination. It is trusted by many around the world, including PagerDuty. It provides high availability and linearizability through the concept of a leader, which can be dynamically re-elected, and ensures consistency through a majority quorum. The leader election and failure detection mechanisms are fairly mature, and typically just work… until they don’t. How can this be? Well, after a lengthy investigation, we managed to uncover four different bugs coming together to conspire against us, resulting in random cluster-wide lockups. Two of those bugs laid in ZooKeeper, and the other two were lurking in the Linux kernel. This is our story.

Share on FacebookTweet about this on TwitterGoogle+

Boundary Integrates with PagerDuty

- May 5, 2015

BoundaryLogo-print (For Light BG) copy

When it comes to monitoring the health of your IT system, the team at Boundary lives by the philosophy that every second counts. Rather than letting data slip through the cracks at five- or even one-minute intervals, Boundary provides real-time monitoring of servers, platforms and apps for IT and DevOps teams with one-second resolution. We’re excited to announce an integration with PagerDuty to help teams resolve infrastructure incidents even faster.

Share on FacebookTweet about this on TwitterGoogle+

Report from ServiceNow Knowledge 15

- April 30, 2015

PagerDuty Knowledge15

Last week, we sponsored a booth and participated with all of ServiceNow’s awesome partners at Knowledge 15 in Las Vegas! ServiceNow is a a powerful platform-as-a-service for IT teams. We heard success stories from customers who use PagerDuty to enhance their ServiceNow experience.

Share on FacebookTweet about this on TwitterGoogle+

Best Practices in Outage Communication: Internal Stakeholders

- April 29, 2015

blog-outage-com-stakeholders

When you’re in the middle of an outage, the last thing you want is people from all over the company constantly asking you when it’s going to be fixed. Your job is busy enough without having to play translator and communication whiz when you have more important things to be worried about. But at the same time, your outage affects people outside of your team. You can’t neglect communicating with internal stakeholders like your manager, or your CTO, or your CEO, or your marketing department, or you customer support team. You see where I’m going with this. So how do you keep your internal stakeholders informed in a timely, efficient fashion?

Share on FacebookTweet about this on TwitterGoogle+

Introducing PagerDuty Integration for Threat Stack

- April 28, 2015

We love PagerDuty and are big users ourselves. We love the ease of integration with our other platforms. We love the scheduling and overrides. We love the per-service escalation groups. We love the sound of our default alert setting, the sad trombone (Though, the more we think about it, “love” isn’t the right word on that last one. That infernal trombone wakes up our team to let us know there is trouble in the Cloud. We dread that trombone).

Share on FacebookTweet about this on TwitterGoogle+

London Conference Wrap-Up

- April 24, 2015

AWS London

Last week our team went on an overseas adventure, sponsoring AWS Summit London and Puppet Camp UK. We heard over and over at AWS Summit that our international customers love our reliable multi-provider SMS, phone, push, and email alerting to over 175 countries (and growing!). Our international SMS alerts all come from local numbers in the countries we alert, so when engineers ack, they don’t incur international fees. International customers are also big fans of UTF-8 support throughout our incident pipeline, so messages in non-western character sets render correctly.

Share on FacebookTweet about this on TwitterGoogle+