A crash course on how to do security and safety, a sneak peak into the future of security and safety risk assessments — and a reflection on where standardization (ISA TR 84.00.09 and IEC TR 63069) is moving.
May I introduce you to a term that‘s very helpful in understanding the nature of the security and safety problems? It’s called emergence.
Emergence is „the magic that happens when a system comes together“. Or, more formally: If we define a system as a set of entities and their relationships, emergence is what happens on top. Emergence is the functionality that emerges out of the combination of all system parts, and that is more than the sum of the parts.
Now, emergence is a great thing. But it also has downsides. Obviously, no one guarantees you that a system‘s emergence is always desirable. A plant produces a product — great. But a plant could also produces a gas that is harmful to the environnment — not so great.
There can also be unanticipated emergences. The plant may produce a byproduct that actually is valuable to another industry — great surprise! But maybe that byproduct combined with the harmful gas causes explosions, which would be a not-so-great surprise or: unanticipated and undesirable.
The era of resilience
There we have it, the common objective of security and safety: We analyze undesirable emergences, trying to prevent them. Or, if we want to make it sound more optimistic: Security and safety have the common objective of making systems more resilient, which means making them able to cope with undesirable changes — both anticipated and unanticipated.
Let that sink in for a moment. There‘s even a term for that: „resilience engineering“. We don‘t have the time to go into the details of that young discipline, but you can think of it as an umbrella discipline above security AND safety engineering.
And here is a crash course on why it is increasingly important.
For a long time, it was enough to be a safety engineer to deal with undesirable emergences. But the problem is: System complexity has increased. And with system complexity, the spectrum of undesirable emergence has increased.
Safety has become better and better dealing with the anticipated part, and most of it was learning out of errors:
First, safety anticipated mainly technology failures, like metal fatigue.
Then we started to learn that it also needs to anticipate human errors, like too many alarms distracting an operator from the one important alarm.
The next era was anticipating socio-technical failures, for example a culture of concealment — that‘s why safety invented such a thing as „safety culture“.
To all these events, safety has an answer. The events get included in the system‘s design base, and safety engineers design a system response in case they happen.
The problem is that when the system becomes increasingly complex, it becomes impossible to include all undesired events in a system‘s design base. For software-based, networked systems that offer an ever-growing playground for human creativity — and human maliciousness —it is simply not feasible to anticipate all undesirable events and design a “safe state” for them. In that case, safety‘s methods may fall short, or may need to be complemented with something more.
System failures due to malicious, creative actions are better analyzed and countered using security methods, hence by looking at the systems from a different perspective than safety does.
Safety’s strength is analyzing and preventing non-malicious technical, human, and socio-technical failures.
Security’s strength is analyzing and countering malicious, creative, human-caused manipulations.
Dealing withunanticipated events in highly complex systems is a challenge only both disciplines together can face.
What security and safety can share — and what they can’t
While it‘s a good start (and by no means a matter of course) to agree safety and security have a common objective that is resilience, that of course is not enough to work with. We need to break it down into something more practical.
In a crash course form, what you roughly need to know to carry out a security and safety risk assessment is pictured here:
There are two steps that security and safety can share:
- First, defining a shared objective.
- Second, transforming the objective into a shared scope.
Then comes the third: Analyzing the scope‘s risk — and this is where you need two different perspectives, one for safety and one for security, one for non-malicious failures, and one for all creative manipulations.
The two perspectives — security and safety — have a shared objective and scope, but not shared methodologies. While this may sound simple, security and safety experts have been working hard on this consensus in joint working groups during the last years.
Defining shared objectives
Now we take a deeper look into defining the shared objective. Beyond resilience — what can be common objectives for safety and security?
Resilience against what is needed?
Or, in more practical terms: What would be intolerable risk?
In our analysis, do we need to protect primarily Humans? The environment? Assets? Information? Business? You can of course say “all of them”, but there are reasonable scenarios where some of these risks are not a priority.
However you define the objective: this objective can be the same for security and safety risk assessments. Both have a shared objective.
Defining shared scope
Next, the objective needs to be translated into a scope to have something tangible to work with during risk assessments.
And there is a way to look at scope that security and safety can share. The keyword is „functions“. Functions specify what needs to be achieved, but not yet how it is achieved — for example, by which technical systems. For an industrial control system, you will have basic control functions and complementary functions that support the basic control functions, and you will have safety functions.
What you define as scope you could name „essential functions“.
Essential functions are all functions for which interference can lead to intolerable risks. By „intolerable risk“ we of course mean exactly the intolerable risk criteria we have defined as an objective before.
So if your only risk criteria is „humans“, you need decide which functions you could mess with to cause harm to humans, and that would be your essential functions.
Safety functions are mostly completely in scope, but the important message here is: It‘s not necessarily ONLY safety functions. Depending on your objective, a large part of your basic control and complementary functions could also be in scope. And, remember: This is a shared scope as a basis for both safety and security assessments.
Analyze the scope’s risk from different perspectives
Up to now, everything was shared. Now for the differences.
If you analyze your scope’s risk, you’re asking for the causes: For your essential functions, you’re wondering what can cause interference with these functions.
Depending on cause, you may need different methodologies in your toolbox, as we have stated before. For non-malicious technical failures and human error, safety has a fairly complete toolset. For creative, malicious ways of interfering with a system, you need security risk assessment methodologies.
So the security and safety methodologies are both legitimate, but different, and they may well be carried out by separate teams. But what you can do is coupling them. Coupling means finding the most efficient path to navigate through the jungle of two methodologies.
Coupling also means that like for coupled train carriages, safety and security CAN travel sections of their journey together, but don’t HAVE TO (and there’s a locomotive called resilience).
In an example, we look at coupling the yellow safety risk assessment process from IEC 61511 and the blue security risk assessment process from ISA/IEC 62443–3–2. Shared process steps are green, and we’ve discussed them before: Definition of the shared objective and scope.
What’s a good way of coupling, hence what’s the most efficient “path through the jungle” is mainly defined by inputs and outputs. In order to avoid unnecessary iterations, you’d obviously want to have produced all potentially useful inputs before beginning any process step, and that really is the whole magic.
For example, it makes sense to do the safety risk assessment and allocation of protection layers before beginning to do anything regarding security, because the major hazards and the definition of safety systems and their placement in the network are relevant inputs for the cybersecurity risk assessment. Safety instrumented systems could be attack targets, after all.
Likewise, the existence of safety requirements, for example any “non-hackable” safeguards like release valves or rupture disks, may impact consequence estimations during the security risk assessments.
As another example, take the zones and conduit diagrams produced on the security side after the initial cybersecurity risk assessment. They are not currently required anywhere as an input for the safety requirements specification, but they could well serve as a useful basis e.g. for defining safety system segregation requirements.
The important point to make here is: Methodologies for safety and security can and need to stay separate, but they can be coupled for efficiency. And sometimes, because of this coupling, the coupled process even becomes better than the sum of both non-coupled processes... which by the way is a prime example of emergence.
The coupling work you’re seeing here is actually a sneak-peak into ongoing ISA84 work, which leads us to our last message for today.
Two security and safety projects to watch
The three risk steps in a security-safety risk analysis we‘ve seen today can be pictured as an increase in the level of detail: A high-level shared objective, broken down into a shared scope, and then more deeply analyzed from different perspectives in a coupled risk analysis process.
If you‘re interested in learning more, here are two standards, or more precisely technical reports, to watch out for. They differ in the level of detail they focus on.
IEC TR 63069, which is under revision starting in these weeks, is more high-level. It defines general principles for combining security and safety, like shared objective and scope you‘ve learned about today, it is planning to give examples for coupled risk assessment processes in its next edition.
ISA TR 84.00.09, which has been under revision since 2019, is more on the low-level coupled process and methodologies end. It aims to define the completely coupled safety-security lifecycle, not only for risk assessments, but also for design, maintenance, and decommissioning. In addition to the coupled processes, the next edition of the ISA TR will also include example methodologies for each lifecycle phase.
Before you go, I would like to leave a statement here that one of my ISA84 colleagues made during our security and safety risk assessment discussions:
Both safety and security need to spread their wings and learn what the other is doing.
This article is based on a conference talk at ICS CyberSec Conference 2021, given by the author on Feb 11, 2021.