PagerDuty Blog

Security Monitoring, Alerting and Automation

Constant validation is an essential piece of PagerDuty’s security methodology – and it takes place by way of continuous monitoring and alerting. A robust monitoring system helps us proactively detect issues and resolve them quickly.

Here are a handful of the monitoring and alerting tactics that we employ.

Port Availability Monitoring

Using our dynamic firewalls, we maintain a list of ports that should be open or closed to the world. Since this information is held in our Chef server, we are able to build out the checks for which ports should be open or closed on each server. We then run these checks continuously, and if one fails, we receive a PagerDuty alert for it. We use a framework called Gauntlt to do this, as it makes simple checks against infrastructure security very easy.

Centralized Logging and Analysis

We currently use Sumologic for our centralized logging. From a security standpoint, we do this because one of the first things an attacker can do is to shut down any logging to hide their tracks. By shipping these logs somewhere else, setting up pattern alerts on them, we can quickly react to problems that we find. In addition to this, we also use OSSEC to collect and analyze all syslog and application log data.

Active Response

Lastly, for well understood attacks, we have tools in place that can take action without any input from our team members. We are still very early in our active-response implementation, but as our infrastructure grows, we will need to build our more of these solutions so we are not constantly reacting to security incidents.

  • DenyHosts. We have deployed DenyHosts to every server in our infrastructure. If a non-existent user tries to login or if there is another brute force attack, we actively block the IP. While we have external SSH disabled on our infrastructure, we still leverage a set of gateway or ‘jump’ servers to access our servers. Since setting this up last July, we have blocked 1,085 unique IP addresses from accessing our infrastructure.

  • OSSEC. We use the open-source intrusion detection system OSSEC for detecting strange behavior on our servers. It continuously analyzes critical log files and directories for anomalous changes. OSSEC has different ‘levels’ of alerts; low- and medium-level ones will send out an email, while high-level alerts will create a PagerDuty incident so a member of our Operations team can immediately respond to the problem. We are not currently leveraging OSSEC’s built-in blocking abilities, but as we learn more about the common attack patterns on our infrastructure, we plan on enabling them.

Being proactive about monitoring is how we keep our services up and running. The active-response tools listed above hint at where we’d like to go with our security architecture.