* Review *

Risk analysis and dependency

Fact of the week

Much of bank security relies on the existence of "tamper-proof" technologies, either by relying on physical isolation of systems, or by building systems which destruct if tampered with. Tamper resistance is almost impossible to achieve in a public arena.

Last week we talked about possible forms of attack against different kinds of system. This week, we need to examine a method for analyzing systems, in order to find their weaknesses and detail our own assumptions about their security.

If we want to talk about security in a more serious way (as more than a video game), we have to say what we mean by it, in a technical sense. Saying that "we want security" is not good enough, because it is too vague. We need to:

For instance, if we are an airport, we could begin by saying something like this We could define a secure system as one in which all of the threats have been analyzed and where countermeasures are in place for all of them. The problem, of course, is how to know what all of the threats are. This course can be divided into discussions of two main themes:
One important lesson that we shall learn here is that "computer security" is not just about computers. It should really be called "security including the use of computers". Security is a property of whole systems (rules and procedures), not of individual parts.

A reminder about law and policy

The foundation of security is policy. If we don't define what is valuable and acceptable, we cannot speak of the risk to those assets. If we don't have policy, then we don't care what happens. A part of policy is defining what should be allowed, another is defining what should happen if something disallowed happens:

Policy (sometimes formalized as "law") is a principle of society. Society is a system (a set of rules and procedures). The first countermeasure to the breakdown of this discipline is a deterrant: the threat of retaliation. "If you do this, we will not like you, and we may punish you!" Society needs mechanisms, beauracracies, police forces and sometimes military personnel to enforce its rules, because there is always a few individuals who do not understand the discpline.

In most cases, especially with computer crime, organizations have few possibilities for reprimanding those who break policy, except to report them to law enforcement agencies. Each country has its own national laws which override local policy, i.e. local security policy has to obey the law of the land. This sometimes causes problems for either side. For instance, in some countries, encryption is forbidden by the government, i.e. citizens do not have the right to privacy; in others, system administrators are not allowed to investigate users suspected of having committed a crime, since it would be a violation of their privacy. These are the opposite ends of the spectrum.

Nowadays, law-enforcement agencies (police forces) take computer crime more serously, but computer crime has all the counterparts of major crime, organized crime, and petty crime. Because the idea of lawful behaviour in a virtual world is still new, computer crime (ignoring local policy rules) is dominated by petty crime, perpetrated by ignorant or selfish users, who do not see their behaviour as criminal. Recall the principle of societies, last lecture.

The principle of communities: What one member of a cooperative community does affects every other member, and vice versa. Each member of a community therefore has a responsibility to consider the well-being of other members of the community.
This same rule generalizes to any system (=society) of components (=members).

Policy Dependencies/Failure modes

The bases of trust are: If we believe we know how something will behave, we trust it. That is one of the main reasons why people can be fooled by criminals.
Example 1: we have become used to using ATM mini-bank terminals for withdrawing money. These are now everywhere, all different shapes and sizes. We trust these terminals to give us money when we enter private codes, because they usually do. One attack used by criminals is to install their own ATM which collects PIN codes and card details and then says "An error has occurred", so that users do not get money. They have then stolen your card details.
This kind of "scam" is common. It aims to exploit your trust. The same can, in fact, be said about any other kind of crime. Here is another example of misplaced trust:
Example 2: airport staff do not trust that passengers will not carry weapons, so they use metal detectors, because most weapons are metallic. They trust their metal detectors to find any weapons. A stone knife could be easily smuggled onboard a plane.
Closer to home:
Example 3: a computer user downloads large files of pornographic material, filling up the disk. This violates policy, but the system manager does not enforce this policy very carefully, so users can ignore it. Here the trust goes both ways. The system administrator trusts that most users will not break this rule, and the users trust that the system administrator will not enforce it.
There are many reasons why people will violate policy: These lead to a "payoff" for the attacker (using a word from game theory), i.e. a perceived win. What makes humans interesting is that we perceive reward or gain in both material and emotional ways. For instance, what is won by vandalism, or warfare? The answer, is that those who commit these acts feel that they gain from others' suffering. That is a uniquely human trait.

Not all security violations are intentional, of course. Lost luggage is generally caused by human error. Programming errors are always caused by human error. Human error is thus, either directly or indirectly, responsible for a very large part of security problems. Here is a few examples of human errors which can result in problems:

Systems and failure

An American Admiral (Grace Hopper) is reputed to have said: "Life was simple before world war 2. After that, we had systems." This hits the nail on the head for security.

Our increasing use of systems (computer systems, security systems, beauracratic systems, quality control systems, electrical systems, plumbing systems) is an embracement of presumed rigour. In other words, systems expect the players to follow rules and exhibit discipline. Without "systems" we have only efficiency as a gauge of success. With systems, we also need to have a beauracratic attention to detail, in order to make the system work. Thus systems are inherently more vulnerable to failure, because they require precision (which is something most humans are not good at).

Systems are characterized by components and procedures which fit together to perform a job. Usually the components are designed as modules, which are analyzed and tested one by one. The analysis of whole systems is more difficult, and is less well implemented. This means that there are two kinds of systemic fault:

Such faults can be exploited by attackers, in order to manipulate the system in undesirable ways. A final cause of failure: An unexpected catastrophe is everyone's worst nightmare. Fire, earthquake, bomb blasts can all destroy assets one any for all.

Prevention/correction

Protecting ourselves against threat also involves a limited number of themes: Detection and correction: We need to apply these to environments which utilize computer systems.

Redundancy - calculating risk

One often hears of systems which have a "single point of failure". For example, our internet connection to the outside world is a single point on which the functioning of our communication with the outside world depends. If someone cuts this line, our communications are dead.

Why don't we have a backup? Sometimes the reason is a design fault, and other times it is a calculated risk.


 Serial : single point of failure (OR)

          --------         ----------
 --------|        |-------|          |--------
          --------         ----------

 Parallel: multiple points of failure - redundancy (AND)

                  --------   
         --------|        |-------
        |         --------        |
        |                         |
        |         --------        |
 -------|--------|        |-------|-----------
        |         --------        |
        |                         |
        |         --------        |
         --------|        |-------
                  --------   

You might remember these diagrams from electronics classes: Kirchoff's laws for electric current. We can think of failure rate as being like electrical resistance: something which stops the flow of work/current.

Fault trees

What is the probability of system failure? Put another way: how do we evaluate the risks, and figure out the best way to protect against them? Fault Tree Analysis (FTA) is a systematic method for doing this. It is a method which is used in critical situations, such as the nuclear industry and the military. It is a nice way of organizing an overview of the problem, and a simple way of calculating probabilities for failure. If we were security consultants, this is how we could impress a customer, with an in-depth analysis: By drawing such a tree, we can understand apparently simple problems in a new light. Computer programs (like fault tree "spreadsheets") exist to help calculate the probabilities.

Fault trees are made of the following symbols:

Key Symbol Combinatoric
(a) AND gate P(out) = P(A)P(B) (independent)
(b) OR gate P(out) = P(A)+P(B) - P(A)P(B) independent
(c) XOR gate P(out) = P(A)+P(B) (mutex)
(d) Incomplete cause (none)
(e) Ultimate cause (none)

These can also be generalized for more than two inputs.

The standard gate symbols give us ways of combining the effects of dependency. The OR gate represents a serial dependency (failure if either one or the other component fails). The OR gate assumes that events are independent, i.e. the number of possibilities does not change as a result of a measurement on one of the inputs; the XOR gate is a dependent event, since a non-zero value on one input assumes a zero value on the rest. The AND gate requires the failure of parallel branches; it could be either dependent or independent. For the sake of simplicity, we shall consider only examples using indepedendent probabilities. This week's exercises are about using these gates.

Immune system example

The OR gate is the most common:

Examination system example

This is not a proper fault tree, it is just a cause tree. How would you fill in the logic gates?

Combining probabilities

From the properties of the gates, we see that So if we see many OR pathways, we should be scared. If we see many AND pathways, we should be pleased. Here is a simple example of how we work out the total probability of failure, for a simple attack where an attacker tries the obvious roots of failure: guessing the root password, or exploiting some known loopholes in services which have not been patched.

We split the tree into two main branches: first try the root password of the system, OR try to attack any services which might contain bugs.

Since all the events are "independent", we have:
P(break in) = P(A OR (NOT A AND (B AND C)))
            = P(A) + (1-P(A)) x P(B)P(C)
Suppose we have, from experience, that
Chance of guessing root pw P(A)     =  5/1000 = 0.005
Chance of finding service exploit   = 50/1000 = 0.05
Chance that hosts are misconfigured = 10%     = 0.1

 P(T) = 0.005 + 0.995 x 0.05 x 0.1
      = 0.005 + 0.0049
      = 0.01
      = 1%
Notice how, even though the chance of guessing the root password is small, it becomes an equally likely avenue of attack, due to the chance that the host might have been upgraded. Thus we see that the chance of break in is a competition between an attacker and a defender.

The problems this week are about taking this idea further.

Thought of the week

It has been said that the only difference between commerce and warfare is politics.

Back