Fall 2021 Launch: Automate Incident Response to Accelerate Critical Work

by PagerDuty November 16, 2021 | 6 min read

Modern businesses are digital businesses—so managing your business means mastering your critical services and operations for your employees and customers. Today, you need to be able to understand every aspect of your company—as it unfolds—because in this world, seconds matter to your productivity, your revenue, and most importantly, your customers.

The PagerDuty Operations Cloud — a cloud-based platform that manages all aspects of urgent and mission critical work for a modern, digital business — continues to evolve to help keep the world always-on. To meet these ever-changing digital demands, we’re excited to offer our customers this latest release, which empowers organizations to automate incident response wherever and whenever possible and accelerate getting critical work done. We’re pleased to show you how PagerDuty helps you connect everything, automate everywhere, and deliver flexibility to your teams.

Watch our launch webinar to see how our latest innovations can help your business.

Connect Everything

For a business to thrive, teams must manage a complex digital ecosystem of various digital services, infrastructure, and customer-facing experiences. There’s little room for error or downtime. To keep up this rapid growth, organizations must master automated incident response — not just to mobilize technical teams, but also to keep stakeholders and business leaders informed with the business-critical information they need to keep their customers happy.

Because modern digital operations have these critical requirements across interconnected services and specialized tool stacks, knowing how your services connect and impact each other is even more important. PagerDuty has extended our leadership in full service ownership to give teams the ability to gain visibility in real time to the information and services they need.

Service Standards

  • Service ownership works best when everyone is on the same page. Service Standards enables account owners to configure and enforce best-practice standards at scale to drive hygiene across distributed teams—without slowing down innovation. This makes it so that organizations can easily define, share, and track the criteria for service configuration according to their unique needs, so that individual teams have clear guidelines for setting up and managing services within PagerDuty.
  • Sometimes a picture is worth a thousand words…or a thousand lines or service status updates. Dynamic Service Graph allows users to instantly discover, map, and visualize business and technical service dependencies across their digital ecosystem. You can view the health of your services at a glance, assess the impact radius of an incident. And also zero in on probable cause, troubleshoot, escalate, and resolve from right within the dynamic interface.
  • When seconds count, Global Search allows users to find attributes corresponding to incidents, alerts, services, and schedules in a flexible and easy-to-use way. In a convenient, centralized location, teams can quickly retrieve the incident details and context they need.


Automate Everywhere

Automation should be a critical priority for digital teams. Increased customer demands caused by today’s digital acceleration make operating in real time even more important. The trick is to democratize access to automated digital operations so that individuals and teams can get their work done quickly, rather than having to wait for overtaxed experts to answer their escalations. Automate everywhere as much as possible so that your teams reduce toil and lag so they can spend time focused on innovating and growing your business. Rundeck’s new products help your teams automate, standardize, and safely delegate operations to resolve incidents faster and speed up your operations.

Rundeck Cloud

  • Rundeck Actions empowers responders to run automated diagnostics on impacted systems, and even remediate incidents themselves right within PagerDuty. Automation engineers can improve productivity and reduce escalations to experts by automating repeated diagnostic steps and frequent remediation actions . To jumpstart customers, we’ve prepared a packaged solution that includes an automation configuration accelerator that can have responders running automation in Rundeck Actions in just a few days.
  • Rundeck Cloud helps automation engineers and central operations teams maximize agility by putting real-time standardized automated actions into the hands of stakeholders such as operators, developers, and end-users. Now engineers can author self-service automated processes without having to deploy or administer a Rundeck cluster. Rundeck Cloud securely connects to any remote infrastructure behind firewalls or within VPCs. Rundeck Cloud helps automation users get started faster, scale elastically, and ensure availability all while staying up to date with latest versions in a highly secure deployment.


Deliver Flexibility

DevOps and Central IT teams of all sizes face three universal challenges that come with digital transformation: 1) noise reduction, 2) root cause analysis, and 3) reducing toil. Despite this, no two teams approach incident response in the same way. A digital operations platform must provide the flexibility and connectivity required to adapt to each team’s bespoke tech stack, culture, and processes, so they can act in real time and effectively drive incident response from ingestion to resolution.

We’re launching a set of features to PagerDuty’s Event Intelligence product that provide users with new noise reduction, root cause analysis, and automation capabilities that will help our customers reduce downtime with fewer incidents and faster resolution:

Event Orchestration

  • Customers have been asking us to let them do more with our event rules, and the team has been hard at work to make this happen. Our new Event Orchestration feature cuts down on manual event processing with a powerful decision engine, where teams can create custom logic to enrich, modify, and control routing based on event conditions at scale. By combining nested event rules with machine learning and precise, targeted automation to trigger actions like diagnostics and remediation, users can boost operational efficiency and reduce toil.
  • When responders first get to an incident, it can be hard to know where to focus. Probable Origin jumpstarts your response efforts with an auto-generated list of likely incident origin points for faster resolution. It uses historical data from correlated incident patterns to surface where (and where not) to look first when troubleshooting major incidents.
  • Nothing is more frustrating than getting interrupted (or worse, woken up) by something that self heals and never needed to be looked at in the first place. Auto-Pause Incidents automatically removes unnecessary noise from flapping alerts with the click of a button. PagerDuty uses machine learning to detect and pause transient alerts that historically auto-resolve themselves so that responders can stay focused on the work that matters.
  • Responders are ack-ing while walking their dog or out with their kids, so having relevant information within reach is extra handy. Change Events for Mobile delivers machine-learning powered intelligence right to the palm of your hands. With the latest context available at a glance right in the incident details on mobile, responders can identify potential change correlation, triage incidents quickly, and reduce time-to-resolution while on the go.

Watch Launch Webinar to hear more and see these features live in action.