Turn any signal into insight and action. See how PagerDuty Digital Operations Management Platform integrates machine data and human intelligence to improve visibility and agility across organizations.
Connect insights to real-time action by aligning teams through the shared language of business impact.
Check out the latest products we’ve been working on—including event intelligence, machine learning, response automation, on-call, analytics, operations health management, integrations, and more.
Digital Operations Management arms organizations with the insights needed to turn data into opportunity across every operational use case, from DevOps, ITOps, Security, Support, and beyond.
Over 300 Integrations
Discover DevOps best practices with our library of webinars, whitepapers, reports, and much more.
Learn best practices and get support help with resources from our award-winning support team.
See how PagerDuty works with our live product demo — twice a week, every week.
We've created a maturity model to assist on the journey to digital operations excellence. Take our short assessment to find out where your team falls!
Interactive, simple-to-use API and technical documentation enables users to easily try updates and extend PagerDuty.
Engage with users and PagerDuty experts from our global community of 200k+ users. Become a member, connect, and share insights for success.
Get all your PagerDuty-related questions answered by exploring our in-depth support documentation and community forums.
Have you ever worked on a team where it was a challenge to give constructive feedback or confidently share ideas? At PagerDuty Summit 2018, Patrick...
PagerDuty helps organizations transform their digital operations. Learn more about PagerDuty's mission and what we do.
Meet our experienced and passionate executive team.
We are risk-taking innovators dedicated to delivering amazing products and delighting customers. Join us and do the best work of your career.
With the PagerDuty Foundation, we are committed to doing our part in giving back to the community.
Guest blog post by Elle Sidell, Lukas Burkoň, and Jan Prachař Testomato. Testomato offers easy automated testing to check websites pages and forms for problems that could potentially damage a visitor’s experience.
We all know how important monitoring is for making sure our websites and applications stay error free, but that’s only one part of the equation. What do you do once an error has been found? How do you decide what steps to take next?
Rating the severity of an issue and making sure the right person is notified plays a big role in how quickly and efficiently problems get resolved. We’ve pulled together a quick guide about the importance of error severity and how to set severity levels that fit your team’s escalation policy.
In simple terms, the severity of an error indicates how serious an issue is depending on the negative impact it will have on users and the overall quality of an application or website.
For example, an error that causes a website to crash would be considered high severity, while a spelling error might (in some cases) be considered low.
Severity is important because it:
Having a severity alert process in place can help you prioritize the most crucial problems and avoid disturbing the wrong people with issues that are outside their normal area of responsibility.
On a larger scale, it makes decisions about what to fix easier for everyone.
Understanding the benefits of rating the severity of an incident is easy, but creating a severity process that works for your team can be tricky. There’s no silver bullet for this process. What works for you many not work for another team – even if it’s the same size or in the same industry.
How you choose to set up your severity levels can vary depending on your team, the project and its infrastructure, the organization of your team, and the tools you use. So where do you start?
In our experience, there are 3 main things you need to think about when creating an escalation process:
Errors with higher severity will naturally require a more reliable notification channel. For example, you might choose to send an SMS using PagerDuty for a high severity error, while one that is considered minor may not trigger an alert to help reduce noise. Instead, you could choose to leave it as a notification from Testomato, which can be viewed by someone at a later time.
1) Severity Structure
One of the easiest ways to set up a severity structure is to identify the most critical parts of your website or application based on their business value.
For example, the most critical parts of an e-shop would be its product catalogue and its checkout. These are the features that would cause an e-shop to severely affect business if they were to stop working. These issues need to be prioritized before all other issues.
Here’s one method we’ve found helpful for creating a severity structure:
There’s no right or wrong way to do this. The most important thing to know is how your team will classify specific incidents and make sure that everyone is on the same page.
2) Organization Structure
The next thing you’ll want to do is take a look at the structure of your team.
Having a clear understanding of how your team is structured and automating issue communication will help you define a more efficient flow of communication later on. For instance, team members responsible for your environment should be notified about issues immediately, while a project manager may only want to be kept in the loop for critical issues so they’re aware of possible problems.
Based on what we’ve seen with the project teams at Testomato, development teams are usually structured according to the following table:
*A small team would generally be found in a web design agency or early stage startup.
For a more detailed structure, here’s a few more questions to keep in mind:
3) Communication Structure
One of the hardest parts of severities can be putting together a communication structure, especially if you don’t have a strong idea about how alerts should flow through your team structure.
Think of it this way:
The main goal of severity levels is to make sure the right people are aware of issues and help prioritize them. Setting a communication structure lets you connect different levels of your severity structure to roles from your organization and add more defined actions based on time sensitivity or error frequency. This way you can guarantee the right people are contacted using the proper channel that is required for the situation. If a responder is not available, there is an escalation path to ensure someone else on the team is notified.
Assigning notification channels and setting thresholds that correspond to your team organization means that problems are handled efficiently and only involve the people needed to solve them.
For example, if a critical incident occurs on your website, an admin receives a phone call immediately and an SMS is sent to the developer responsible for this feature at the same time. If the problem is not resolved after 10 minutes, the team manager will also receive a phone call.
On the other hand, a warning might only warrant an email for the team admin and any relevant developers.
Within PagerDuty, you can create 2 Testomato services – one general and another that is critical – and match these services to the escalation policy needed. If you have SLAs of 15 minute for critical incidents, that escalation path with be tighter than general incidents.
Here’s a basic overview of how we use severity levels at Testomato using both PagerDuty notifications and our own email alerts:
Team Members: manager, 2 admins (responsible for production), and 2-3 developers.
When errors occur on their project using the following process:
PagerDuty – SMS and Phone Call
Testomato – Email
We hope you’ve found this post helpful! What severity alert process works best for your team?
I love writing software, but I hate dealing with bugs. They take you away from what you want to be doing and often lead you...
A few weeks ago, I had the pleasure of attending PagerDuty Summit 2018 as Zenoss was a proud partner and sponsor for the conference. It...
600 Townsend St., #200
San Francisco, CA 94103
905 King Street West, Suite 600
Toronto, ON, M6K 3G9, Canada
1416 NW 46th St., St. 301
Seattle, WA 98107
5 Martin Place
1 Fore St,
London EC2Y 9DT
© 2009 - 2019