Turn any signal into insight and action. See how PagerDuty Digital Operations Management Platform integrates machine data and human intelligence to improve visibility and agility across organizations.
Connect insights to real-time action by aligning teams through the shared language of business impact.
Check out the latest products we’ve been working on—including event intelligence, machine learning, response automation, on-call, analytics, operations health management, integrations, and more.
Digital Operations Management arms organizations with the insights needed to turn data into opportunity across every operational use case, from DevOps, ITOps, Security, Support, and beyond.
Over 300 Integrations
Discover DevOps best practices with our library of webinars, whitepapers, reports, and much more.
Learn best practices and get support help with resources from our award-winning support team.
See how PagerDuty works with our live product demo — twice a week, every week.
We've created a maturity model to assist on the journey to digital operations excellence. Take our short assessment to find out where your team falls!
Interactive, simple-to-use API and technical documentation enables users to easily try updates and extend PagerDuty.
Engage with users and PagerDuty experts from our global community of 200k+ users. Become a member, connect, and share insights for success.
Get all your PagerDuty-related questions answered by exploring our in-depth support documentation and community forums.
I love writing software, but I hate dealing with bugs. They take you away from what you want to be doing and often lead you...
PagerDuty helps organizations transform their digital operations. Learn more about PagerDuty's mission and what we do.
Meet our experienced and passionate executive team.
We are risk-taking innovators dedicated to delivering amazing products and delighting customers. Join us and do the best work of your career.
With the PagerDuty Foundation, we are committed to doing our part in giving back to the community.
It generally pays to look beyond labels, such as “incident management” (which usually means much more than receiving and responding to alerts). Consider, for example, the relationship between incidents and technical debt. It is a relationship that most software professionals probably haven’t even thought about, but it exists, and it is more than just a passing acquaintance.
Although new or recently revised code accounts for the majority of software errors, when you trace the problems caused by changes in code, they will very often lead to old patches of code containing technical debt.
This shouldn’t really be surprising. Technical debt is, by definition, code that contains built-in problems — in design, execution, integration with the rest of the program, and, very often, a combination of these factors. Later changes to code that interacts with technical debt, either directly or indirectly, can expose or amplify those problems.
Why? Consider the conditions under which programmers are likely to add technical debt. Typically, there’s a problem that needs to be taken care of quickly, and speed matters more than taking care of the issue the right way. It may be an emergency bug fix, a change to accommodate an operating system update, new features added under a tight deadline, code from another source being patched in, or simply a quick workaround to accommodate previous technical debt. When the code is added, it’s cleaned up and debugged to the point where it doesn’t cause any errors, but it isn’t up to contemporary standards for design or coding. That’s why it’s technical debt, and not just new code.
This means that it isn’t likely to be bulletproof, and its bug fixes and error handling are likely to be improvised and patched together. It’s like building a bridge with a badly designed truss or weak girders. The problem spots may be OK at first, but with added traffic or later structural changes, the probability of failure is likely to increase. In the same way, later revisions of your software may stress the parts of your code that contain technical debt beyond their limits.
Where does incident management come in? While not all incidents require analysis and revision of source code, many of them do. The point at which code is being revised is also the most obvious time to eliminate any technical debt that it contains. Even when the incident response itself doesn’t require any changes to the software, it can result in the discovery of previously unrecognized debt, which can then be scheduled for revision. Incident management can also serve as a warning and detection system for underlying problems in software design and coding. Repeated problems involving the same block of code are a good indication of problems with the code itself.
If technical debt is currently (or potentially) a significant issue with your software, you may want to adopt an overall policy and a formal framework for the elimination of technical debt. A technical debt policy could cover the following general areas:
The framework for carrying out such a policy might include components such as these:
There are several points at which such a framework would benefit from being tied in with an incident management system, particularly by means of a like system’s API. For example, incident reports could be exported to the application used to map debt, both for the purpose of correlating incidents with known problem areas and the mapping of newly identified technical debt. Incident management tool APIs can also be used to log incidents involving technical debt, and automatically generate work orders for remedying that debt. Those tools could also be used to alert developers who have the responsibility of handling technical debt in specified areas of the code.
Such a framework makes it possible to incrementally eliminate technical debt as part of a system for incident management and response, and provides an automated method of assuring that technical debt is dealt with. Incident management is a key aspect of the framework, providing tools for detecting debt-related problems, alerting responsible parties, and scheduling code revisions to fully eliminate technical debt. It ensures that it won’t simply wind up being kicked a little farther down the road.
This is a guest post by Ilan Rabinovitch, Director of Product Management at Datadog. The convergence of rapid feature development, automation, continuous delivery, and the shifting...
Dynamic Notifications are now out in the wild! With our launch today, we give PagerDuty users the power to dynamically adjust how they are notified...
600 Townsend St., #200
San Francisco, CA 94103
905 King Street West, Suite 600
Toronto, ON, M6K 3G9, Canada
1416 NW 46th St., St. 301
Seattle, WA 98107
5 Martin Place
1 Fore St,
London EC2Y 9DT
© 2009 - 2018