Turn any signal into insight and action. See how PagerDuty Digital Operations Management Platform integrates machine data and human intelligence to improve visibility and agility across organizations.
Connect insights to real-time action by aligning teams through the shared language of business impact.
Check out the latest products we’ve been working on—including event intelligence, machine learning, response automation, on-call, analytics, operations health management, integrations, and more.
Digital Operations Management arms organizations with the insights needed to turn data into opportunity across every operational use case, from DevOps, ITOps, Security, Support, and beyond.
Over 300 Integrations
Discover DevOps best practices with our library of webinars, whitepapers, reports, and much more.
Learn best practices and get support help with resources from our award-winning support team.
See how PagerDuty works with our live product demo — twice a week, every week.
We've created a maturity model to assist on the journey to digital operations excellence. Take our short assessment to find out where your team falls!
Interactive, simple-to-use API and technical documentation enables users to easily try updates and extend PagerDuty.
Engage with users and PagerDuty experts from our global community of 200k+ users. Become a member, connect, and share insights for success.
Get all your PagerDuty-related questions answered by exploring our in-depth support documentation and community forums.
In a world where everything comes down to moments of truth, teams must respond to issues and opportunities in seconds. Rising customer expectations demand real-time...
PagerDuty helps organizations transform their digital operations. Learn more about PagerDuty's mission and what we do.
Meet our experienced and passionate executive team.
We are risk-taking innovators dedicated to delivering amazing products and delighting customers. Join us and do the best work of your career.
With the PagerDuty Foundation, we are committed to doing our part in giving back to the community.
The threat landscape is expanding at a crazy pace. There are new vulnerabilities released every day, and the amount of servers, applications, and endpoints for ITOps to manage is continually growing. These threats are also growing more potent and frequent, as a recent spate of global ransomware attacks have seen perpetrators extort thousands of dollars. Experts believe that they’re often a ruse that mask attempts to destroy data.
As organizations adopt bimodal ITOps methodologies in order to be more agile, avoiding incidents and increasing security can pose quite a challenge. Some new challenges include leveraging containers and public cloud resources, managing security incidents across these separate data domains, and working entirely new sets of pseudo-admin users who have access to key resources. To enable full stack visibility and incident resolution for the ever-expanding demands on ITOps, a multifaceted strategy to SecOps is required. In fact, I tend to think of SecOps incident management as a necessary combination in order to build a truly secure environment that is both actionable and visible.
First and foremost, reducing the complexity of your SecOps stack will help you maintain actionability while enforcing your SecOps policy. To put it simply, thwart the attack and notify your ITOps team that it needs to remediate. Simplicity is key when reducing the noise of your security alerts and incidents so you can focus on the signals that truly matter. SecOps practices advise that teams leverage a built-in stopwatch to react as quickly as possible and ensure threats are stopped before they do damage to production SLA’s and critical data. The best examples of this severity is when networks and systems are exposed to Zero-Day Threats or ransomware. In these cases, the key is to build a strategy around stopping and preventing exposure to massive threats while issuing alerts to your incident management system. In the case of crypto-ransomware, such as Cryptolocker and Cryptowall, the goal is to leverage tools that prevent the ransomware from engaging the threat (Stage 2 of the below infographic from Sophos), thereby preventing the handshake and averting the crypto infection.
We can then ensure that firewalls, endpoints, third party security monitoring tools, and other relevant data sources are piped into a central incident management solution. This way, SecOps and ITOps can be immediately notified and equipped with the data and workflows required for effective investigation and remediation of high-priority issues. Using effective security tools remains crucial for the success of managing your security incidents.
The ability to not just detect and notify, but also enrich, escalate, and facilitate remediation and future prevention of issues are equally as important in best practice, end-to-end security incident lifecycle management. Again, to accomplish this full stack visibility, you’ll want to integrate and aggregate all of your security systems into a central incident management solution. For example, configure your firewalls and network devices to aggregate information into your monitoring platform by leveraging SNMP traps/queries, as well as integrating syslog servers to send all security incidents to these sources.
When configuring your firewall and network syslogging, you can save a significant amount of time and reduce alert fatigue by configuring thresholds for warning and critical alerts versus info and debug alerts. Depending on your vendor, thresholding can vary. However, with SNMP, filtering the OID to disregard information-based and debug alerts while permitting alerts from warning and critical status messages, ensures that only high-priority alerts get sent to your incident management system.
With syslogging, you can set more granular logging conditions, but the key here is to keep the noise down and only notify on specific conditions. Once you’ve aggregated these events into your monitoring system, you can establish a framework to enriches the alerts with actionable information and routes them to your team to remediate threats.
Syslogging can be valuable for a few reasons. Not only does it capture detailed information on the security and the network data flowing into your monitoring systems, it can also facilitate intrusion detection and prevention as well as threat intelligence. Instead of piping your syslog directly into a monitoring system, you also have the option to send your syslog data into a third party intrusion analysis system like AlienVault or LogRhythm to increase your intrusion visibility and enrich your logging data, creating actionable alerts. Then you can send those alerts to your incident management system (such as PagerDuty) so you can group related symptoms, understand root cause, escalate to the right expert, remediate with the right context, and view and construct analytics and postmortems to improve future security incident response.
Finally, the same framework can be implemented for organizations with hybrid cloud or public cloud resources, although you will need to leverage different third party tools to analyze and enrich your visibility and alerting. For example, leveraging Azure Alerts when leveraging Microsoft Cloud or AWS Cloud Watch when utilizing Amazon’s cloud will allow you to configure similar thresholding and noise reduction with your public cloud server monitoring and alerting. The good news is that there are also third party tools such as Evident.io and Threat Stack that will conveniently perform security-focused analyses across your cloud infrastructure, for anyone with an agile, public, hybrid, or bimodal ITOps strategy.
Whatever suite of tools and systems you prefer to leverage when designing full stack incident management processes that fit your SecOps team, the fundamentals of simplicity, visibility, noise reduction, and actionability remain paramount to success. ITOps and SecOps teams are in very similar positions in which the demands of the business often conflict with the ability of these teams to ensure secure and efficient access across an ever-growing list of devices, services, and other endpoints.
To learn more about best practices for security incident response, check out PagerDuty’s open-sourced documentation, which we use internally. You’ll get an actionable checklist and insights on how to cut off attack vectors, assemble your response team, deal with compromised data, and much more. We hope that these resources will give you a head start in building a solid framework for optimizing SecOps with effective incident management, as that will be your recipe for success.
This blog was co-authored by myself and Simon Darken. Once a year, PagerDuty’s SREs get together for a three-day, in-person offsite. With the team spread...
In the United States, it’s almost that time of year again where we count our blessings and give thanks. For retail workers, it’s also that...
600 Townsend St., #200
San Francisco, CA 94103
905 King Street West, Suite 600
Toronto, ON, M6K 3G9, Canada
1416 NW 46th St., St. 301
Seattle, WA 98107
5 Martin Place
1 Fore St,
London EC2Y 9DT
© 2009 - 2018