In our always-on, IoT-enabled, cloud-connected, big data age, we face a major paradox: it’s now easier than ever to collect large amounts of data — yet the more data we collect, the harder it becomes to monitor situations effectively. This problem is similar to what psychologists call “information overload” — the phenomenon that causes someone […]

| In Alerting, Features

For many of our customers, reducing alert noise is a difficult, yet rewarding task. Cleaning up your alerting means fewer late night pages and happier team members. But this task can feel a lot like yak shaving if you don’t have the proper tools. In this post, I’m going to run through an effective workflow […]

| In Alerting, Monitoring

Avoiding Noise in Incident Management Suppression. According to the thesaurus, this word is synonymous with terms like deletion, elimination, and annihilation. Yet within the context of incident management, suppression means something quite different. It’s not about getting rid of data forever. It serves instead as a way of making sure that admins focus on the […]

| In Alerting

Silo’d responsibilities have wreaked havoc on team communications, making it difficult for different departments to have the full context of a situation during fire fights. This has not only reduced the quality of communication across entire development teams, it’s also created a serious issue that plagues many on the operations side — alert fatigue. Alert […]

| In Features, Operations Performance

We get it. You hate getting alerts. As Jason Floyd, Senior DevOps Manager at Real Networks put it, “I love you and I hate you. PagerDuty makes my job easier and wakes me up at 4 AM.” You are (kind of) ok with getting woken up at 2 AM if your server is really on fire. But […]

Many solutions offer email alerts to notify customers of an issue. Email alerts are effective if you’re in front of your inbox all day, but the reality is we usually aren’t. Missed alerts extend outages and impact your company’s revenue and customer loyalty. To know about issues quickly, thousands of customers have chosen PagerDuty for […]

| In Reliability

We are frequently asked by our customers if PagerDuty uses PagerDuty. The answer to that is simple, Yes. While we could end the blog post here, we thought we would dive a bit deeper giving you an inside look to how we utilize our own service to stay available. PagerDuty Using PagerDuty, It’s Pretty Meta […]

| In Alerting, Operations Performance

Earlier this month at Nagios World Conference North America, Arup Chakrabarti, Operations Engineering Team Lead @ PagerDuty gave a talk on “What You Should Monitor and Alert on in a Production System” and discussed how to filter out useful metrics for actionable alerting. In case you missed it at the conference, we wanted to share a […]