PagerDuty Blog

IT Operations Health — Visualized

IT Operations professionals today require infrastructure-wide context to effectively remediate incidents, decrease non-actionable alerting, and continuously improve incident management capabilities. With the proliferation of microservice architectures, applications are rapidly growing in complexity and are generating ever more telemetry. These trends compound the difficulty in gaining broad operations health awareness and understanding business impact. As a result, incident responders often lack visibility into the blast radius of incidents.

To solve this problem, PagerDuty has released the Infrastructure Health Application, a core  Intelligence Application powering the Operations Command Console.


The Ultimate Timeline

The Infrastructure Health Application provides a visual overview of all of the alert clusters across the services and hosts in your IT infrastructure. These visualizations can be used in several capacities to not only aid in incident response, but to help you improve the overall health and performance of your applications.

    • During the Firefight

      The Infrastructure Health Application updates in real time, ticker-taping from right to left as alerts and events roll in.

      Responders can consult the Infrastructure Health Application during an incident to quickly assess the scale of the issues at hand. For example, is a single service down? Or are you facing a multi-service cascade type incident that will require additional teams and resources to be marshaled?

    • During a Postmortem

      Once the dust has settled, you need to figure out why things went wrong and how to ensure that the same incident does not re-occur.

      Reviewing the timeline of events in the Infrastructure Health Application before the incident designation can lead to unique insights. Were there leading-edge indicators of the incident? Do we have alerting properly configured, or did this incident occur without forewarning?

    • Proactive Deduction  

      Find patterns in your infrastructure data and spot leading edge indicators of issues before they develop into incidents.

      You can also improve your alerting by identifying particularly active services that are continually paging your team. Lastly, you can pivot your Infrastructure Health visualization by source for a completely different perspective on your data.

By leveraging other PagerDuty features, you can take full advantage of the Infrastructure Health Application’s visualizations.

      • Services Group: Utilize PagerDuty’s Services Group feature to model your PagerDuty Services to your business critical services.
      • Custom Event Transformer: Bring in additional event contexts e.g. deploys and tweets, using our Custom Event Transformer. Juxtaposing these events with your alerting clusters is a powerful tool to determine the root cause of incidents and understand customer impact.

The Infrastructure Health Applications is one of the many applications of the Operations Command Console. Learn more about this console and the other applications that comprise it by checking out our Operations Command Console blog post.

Sign up for your trial today and accelerate your incident response using our new Operations Command Console, Infrastructure Health Application, and more!