Turn any signal into insight and action. See how PagerDuty Digital Operations Management Platform integrates machine data and human intelligence to improve visibility and agility across organizations.
Connect insights to real-time action by aligning teams through the shared language of business impact.
Check out the latest products we’ve been working on—including event intelligence, machine learning, response automation, on-call, analytics, operations health management, integrations, and more.
Digital Operations Management arms organizations with the insights needed to turn data into opportunity across every operational use case, from DevOps, ITOps, Security, Support, and beyond.
Over 300 Integrations
Discover DevOps best practices with our library of webinars, whitepapers, reports, and much more.
Learn best practices and get support help with resources from our award-winning support team.
See how PagerDuty works with our live product demo — twice a week, every week.
We've created a maturity model to assist on the journey to digital operations excellence. Take our short assessment to find out where your team falls!
Interactive, simple-to-use API and technical documentation enables users to easily try updates and extend PagerDuty.
Engage with users and PagerDuty experts from our global community of 200k+ users. Become a member, connect, and share insights for success.
Get all your PagerDuty-related questions answered by exploring our in-depth support documentation and community forums.
Have you ever worked on a team where it was a challenge to give constructive feedback or confidently share ideas? At PagerDuty Summit 2018, Patrick...
PagerDuty helps organizations transform their digital operations. Learn more about PagerDuty's mission and what we do.
Meet our experienced and passionate executive team.
We are risk-taking innovators dedicated to delivering amazing products and delighting customers. Join us and do the best work of your career.
With the PagerDuty Foundation, we are committed to doing our part in giving back to the community.
Being on-call is already a demanding and sometimes very unforgiving responsibility. If you are working in a regulated industry, however, the demands that incident management places on your organization are likely to be even greater and even less forgiving. In this article, we’ll discuss some of the basic principles of software-related incident management in regulated industries.
First, however, let’s take a quick look at what a software-related incident means in a regulated industry. If you were to ask most people in software development or IT to define “incidents”, they may talk about them in terms of downtime or poor application response time. Another important factor could be security — break-ins, data theft, failure to protect sensitive data, etc.
But in regulated industries, the term “incident” has a scope that goes far beyond downtime and security issues; it can be anything which places the organization or its products or services out of compliance with regulations. For a water company, that might be the presence of E. Coli bacteria in the water supply. For a bank, it could be the loss of customer financial data. For a hospital, it could be the failure of critical life-support systems. Incidents involving public safety, the loss of crucial data, or interruption of key services, when regulatory compliance is at stake, may be at least as important as those involving ordinary downtime.
One of the most fundamental issues for any organization involved in a regulated industry is the need to stay in compliance with applicable regulations. Depending on the industry and the nature of the incident, being out of compliance can result in:
In other words, the stakes can be very high; you do not want to be in the position of explaining your incident management procedures to a judge.
How do you manage incidents under such strict conditions? The best incident management is prevention — to take care of all potential incidents before they become compliance issues. That isn’t always possible under real-world conditions, so it is important to have incident-response plans which meet both legal requirements and practical necessities. To do this, it’s important to take into account the following factors:
Organizations operating under the Health Insurance Portability and Accountability Act (HIPAA) or the Payment Card Industry Data Security Standard (PCI DSS), for example, must have a documented security-response plan and a response team; the Federal Information Security Management Act (FISMA) likewise includes detailed incident management and response guidelines for federal agencies. Find out which agencies and which requirements your organization is subject to, if you do not already know, and make sure that you are in complete compliance with all requirements.
If there are no specific guidelines for your industry, the Common Criteria and Common Evaluation Method documents provide a useful framework for understanding general IT security and public-safety issues.
There are some basic considerations which apply to all regulated industries and all regulatory frameworks:
Identify all sensitive systems (applications, networks, services, etc.) in which a failure or other malfunction could lead directly or indirectly to a compliance problem. A database containing client medical records, for example, or a program that manages the distribution of power for a public utility, is likely to fall under this heading. Your company’s bookkeeping software, as important as it may be, is probably not a sensitive system in this context.
Your first line of incident management defense is to prevent any of the systems which you have identified as sensitive from even approaching a state of failure. This means that your incident response team should be alerted not only for any failure in these systems, but for any condition which has the potential to lead to a failure. For security-sensitive systems, this might be any activity which suggests an attempted break-in, or any degradation in performance of the security software itself. For systems where public safety is at stake, this could include any anomalous behavior in any key metric. Needless to say, prevention includes full backups of data, and where necessary, full backup systems on standby.
Catching problems before they turn into regulatory compliance failures also requires an incident response team completely in sync, armed with full context from all information sources. In these situations, every second counts! For that reason, it’s vital to have responders defined ahead of time, clear escalation policies, and access to metrics from multiple systems pulled together into a unified view of the issue.
You will in effect need to add another level of priority to your existing incident management triage, giving all compliance-related incidents overriding priority. This means that if your bookkeeping and inventory systems both crash completely, and at the same time, your medical records database starts to act like it’s just a bit under the weather, your accounting staff and warehouse crew may need to stand around until your emergency response team takes care of the database if you don’t have enough IT people on hand to attend to everything. And if public safety is involved, your response team may need to be ready to keep crucial systems going in the immediate aftermath of a major disaster.
All of this may sound formidable, and expensive as well. But the cost of a major incident can be much higher, particularly if a regulatory agency or a judge determines that your company has failed to adequately comply with regulations. The bottom line for you and your company is that preventative incident management is by far the best protection you can have.
If you’re looking for a resource to improve your incident response processes and workflows, check out our open-sourced incident response documentation as well as our financial services solutions brief for an example of how PagerDuty helps regulated industries.
In the United States, it’s almost that time of year again where we count our blessings and give thanks. For retail workers, it’s also that...
A long time ago, back in the early days of 2017, we open-sourced our Incident Response Documentation, the reference point for all our internal processes...
600 Townsend St., #200
San Francisco, CA 94103
905 King Street West, Suite 600
Toronto, ON, M6K 3G9, Canada
1416 NW 46th St., St. 301
Seattle, WA 98107
5 Martin Place
1 Fore St,
London EC2Y 9DT
© 2009 - 2019