“I need to be notified if there’s a significant event ongoing with SignalFx.” This is what I tell my team. However, despite being the CTO...by Arijit Mukherji
September 19, 2018
Smart devices require smart monitoring. That’s not a platitude. It’s an imperative. In fact, the smarter the device, the smarter you need to be about monitoring it.
As headlines have shown, unmonitored, unprotected smart devices may be a disaster (or a DDoS attack) just waiting to happen. Consider the following:
Last year’s wave of DDoS attacks was a wake-up call. Many smart devices have little or no built-in security, and that combined with wireless communication and the sophisticated features of their built-in operating systems, makes them particularly tempting targets for an attack.
It’s worth noting that the DDoS attacks simply made use of smart devices as nodes in a botnet, without specifically exploiting their hardware control or monitoring capabilities. It is not difficult to imagine a more targeted attack aimed at specific classes of smart devices and designed to make use of those hardware control capabilities, possibly with catastrophic consequences.
Even an attack on a single device could cause considerable damage, depending on its function. Given the potential danger and lack of built-in security, adequate monitoring is a necessity for detecting and preventing future attacks.
Building so much intelligence into smart devices gives them the capability to handle tasks that would be too complicated for traditional, non-smart devices. A smart device may control complex mechanical actions, balance power loads, or adjust environmental conditions based on sophisticated algorithms processing inputs from a variety of sensors. The more complex a system is, the greater the chances of significant errors. Monitoring smart devices serves as a safeguard against such errors.
It’s one thing if a smart toaster chars your English muffin, but when a smart medical device is controlling a suite of life-support systems, there’s no margin for error.
Smart devices may run expensive factory-floor equipment, traffic lights, power production and distribution facilities, as well as other important (and sometimes crucial) resources. Even a smart system for monitoring and balancing home electrical power may cause significant problems and economic losses if it fails. And when the failure of a system could endanger life or public safety, a major malfunction is not an acceptable option. It would be irresponsible and dangerous not to consistently monitor smart devices which control absolutely crucial functions, and that negligence could lead to significant legal repercussions or worse.
What is the best approach to monitoring a smart device? To a considerable degree, that depends on the device itself — what it does, what kind of built-in or bundled control and monitoring software it includes, and what kind of monitoring data it makes available. These are the key points to keep in mind:
A device may come with software that includes monitoring functions, or it may simply produce raw data that can be used for monitoring. Does it have an API? If it is compliant with the Open Connectivity Foundation (OCF)’s specifications for smart devices, it should include well-defined methods for querying and monitoring the state of the system. If it has a proprietary API, what kind of monitoring functions are included? Even without any kind of real API, however, it may be possible to extract some kind of monitoring output.
What monitoring information is available, and what information is important to you?
A complex OCF-compliant device may be able to provide you with a wide range of data about its operational state. When you have an abundance of monitoring data, to optimize response you need to decide what information should be monitored on an ongoing basis (to detect malfunctions and generate alerts, for example), and what information can simply be logged.
A simple non-compliant device, on the other hand, may produce nothing more than a cryptic (and non-specific) error code when it detects a failure or an out-of-range value. In that case, you simply have to take what you can get, and make the best of it.
Even if you have decided what information needs to be included during the specification/design stages of setting up your monitoring system, you may still need to filter out background noise (transient, slightly-out-of-range values, for example) on an ongoing basis, so that you can more clearly identify conditions that require an alert.
Alert noise is more than just an annoyance. If there is too much of it, it may mask signals that do require attention. Worse, it can even cause alert fatigue in response teams, so that they fail to recognize alerts that do require immediate attention.
Like filtering; analysis, sorting, and dispatching are all key elements of any issue resolution system. If an alert produced by a smart device is important enough to require a response, it should be correlated with other related symptoms (to reduce responder noise), automatically routed to the appropriate responders, and embedded with relevant or troubleshooting information to streamline the response.
Registering an alert in your system is only the beginning of the resolution process. Without effective, real-time information consolidation and dispatching, even the highest-urgency alerts may be in danger of getting lost.
Smart monitoring for smart devices isn’t an option — it’s a necessity. Knowing this, how can you get started in implementing the right solution? The key to monitoring smart devices lies in making optimal use of the data that each device provides. In the case of complex, OCF-compliant devices, this could mean sending monitoring data to customized control software in order to automatically respond to out-of-bounds conditions. The control software could then decide whether or not to generate an alert based on the device’s response to the adjustments made by the control software.
In effect, this adds an extra layer of smartness to the device and the monitoring system, taking advantage of the rich set of features included in the OCF specifications. Even in the case of a relatively simple, non-compliant device, it’s possible to make use of the data that’s provided to set up genuinely smart monitoring and produce a genuinely smart response. In a world of increasingly more unknowns, implementing such a solution can help your teams minimize the potential financial and security repercussions of smart devices gone rogue.
Ready to take your monitoring to the next level? Sign up for a free PagerDuty trial today!