Cloud-Powered Application Incident Resolution Google as an organization amazes me everyday. The great things they do that impact people’s lives from the individual to the…

Operational Intelligence Meets the Fastest Path to Incident Resolution Splunk Enterprise is the industry-leading platform for machine data. It collects all your machine data from…

| In Alerting, On-Call Life

Have you ever caught a ticket that you just couldn’t figure out? You spend hours on Google, slowly reading the entirety of Stack Overflow, while…

| In Alerting, On-Call Life

While I’ve had an interest in computers for almost as long as I can remember, it wasn’t until I was a freshman in college that…

Incident management is a key facet of supporting applications. When working on an application, we spend the vast majority of time on its release to…

| In Alerting, On-Call Life, Operations Performance

  It generally pays to look beyond labels, such as “incident management” (which usually means much more than receiving and responding to alerts). Consider, for…

Accidents Happen It’s a fact: well-meaning team members, in the heat of the moment – and often in the middle of the night – sometimes…

| In Alerting, Community, On-Call Life

The Roman Pillar of Justice Hackday is my favorite day of the month. Yet, it’s becoming increasingly difficult to win as PagerDuty hires more employees….

Many DevOps companies embrace risk, but fear of failing is hard-wired into most of us. Here are 3 ways to handle an emotional reaction to failure.

| In Community, Events, On-Call Life

I am not always a fan of “Women in Tech” events. So imagine my surprise to find myself spearheading a Women’s Leadership Circle here at PagerDuty. How do women-focused career events actually help?

Democracy: the great experiment. The voice of the people leading. The end of rigid and overbearing hierarchies. These principles have been with us for over two centuries in government, but many business models still look like the British Empire. As the pace of development continues to scale and customers come to expect real-time response to their concerns, businesses with complex IT departments are transitioning to a DevOps model that gives them the agility to stay up and responsive to the voice of the people. Here we explore how fostering a DevOps culture can build a more democratic workplace and customer experience.

Operations teams are receiving more telemetry data from monitoring systems than ever before. But they are struggling to sift through this data to find what really matters – resulting in alert fatigue and missed alerts. For this reason, we’re proud to announce that long-time partner Event Enrichment HQ is joining the PagerDuty family to deliver the industry’s first integrated event management and incident resolution platform. Adding Event Enrichment HQ and its keystone product, the Event Enrichment Platform (EEP), to PagerDuty helps you quiet your noisy monitoring systems, reduce alert fatigue, and slash your incident resolution times.

No one should need to be convinced the value of good data. It gives you the confidence to make decisions quickly and with less risk, it allows you to measure your success, and it lets you know when you need to adjust your course. But there’s a difference between knowing the value of data, and creating a culture around it. A data-driven culture is a culture where everyone quantifies their actions as much as possible, and asks themselves how their teams are having a tangible impact on the business. It turns your entire organization into a squad of analysts. But fostering a data-driven culture isn’t always easy. Here are five steps that will help you get there.

| In Alerting, On-Call Life

Something goes wrong in your staging environment, and you start seeing “CRITICAL” or “ERROR” all over the place. Oh… I forgot to mention that it’s 3am where you live. Is it really “critical” in that moment? Well, technically it is. The environment is still busted. But do you want to fix it now? Is it urgent?

| In On-Call Life

We know that alert fatigue is a big concern for our users. When everything is important, nothing is important. But “non-critical” is not the same thing as “insignificant”; in fact, non-critical issues are often indicative of a larger problem down the road. So now, with Incident Urgencies, users can confidently track all events, and only get woken up for the most important ones.
A big part of what has made PagerDuty useful for our customers is analytics, and being able to see what’s going on with events across all of their systems and monitoring tools. Keeping non-critical events out of PagerDuty means those analytics are only telling part of the story. And the more data you have, the easier it is to prevent incidents from occurring in the future.

Too many companies take the happiness of their engineers for granted. This is a huge mistake, especially since engineers are doing important work for your company: building your product, and then keeping it up-to-date and functioning. Their morale has a direct influence on their performance, and, by extension, your product. Part of the DevOps ethos is getting engineers working together better, smarter, and happier. But why should executives care about that?

| In Alerting, On-Call Life

Using ticket systems can be fraught with issues: a clunky workflow, mired in process, means that users can’t always move and adapt quickly. While ticketing systems are a great way to manage a ticket queue of ongoing requests, we’ve noticed that many operationally mature companies stay away from ticketing systems for their real-time incident management. Instead, they are using a more lightweight solution, like PagerDuty. A lightweight solution, with a focus on automation, allows them to be more agile, and get things done faster.

| In Alerting, Announcements, Community, Features, On-Call Life

We’re pleased to announce our fourth major mobile release, which brings some significant improvements to the performance and usability of key parts of the app. With all these changes, it’s faster and easier than ever to see, investigate, and take action on problems in your system — driving down resolution time and helping your team improve your operations performance.