* 9 *

Software security I

Fact of the week

Internet banks use a system of authentication which is based on one-time passwords or signatures , often called "digipass" or "calculator". These generate passwords based on a secret key, and the time of day. They keys expire after about 15 minutes.

Chapter 8 Gollmann
So far in the course we have been looking at mechanisms and models for security in software systems from a passive observer's viewpoint. Having considered some of the problems one faces in building secure systems, we turn our attention to some more specific problems which face active software developers. The examples we shall consider her will be for POSIX compliant (standard Unix-like) operating systems, but the lessons we are trying to drive home here apply to other operating systems also. Some of the issues we shall discuss have been serious bugs in the past. It is important to realize that, even if these specific problems have been fixed in their historical context, the same problems can and will occur again unless we learn from the mistakes.

Software security design issues

As Gollmann points out, there are three curses in security. These are not just security issues, but more generally quality-control issues. Let's begin with some common sense issues which directly affect the security of software security.

Formalism is our friend

Secure communication means securing integrity and understanding. Protocols are good for this. A protocol is a formalized behaviour, a kind of form to fill out. It is communication beaurcracy. There are many examples. For instance, in programming there is a protocol for passing information between functions using parameters:
function(i,j);

...


void function (int a, int b)

{
}
In C it was possible to send a different number of parameters than the function definition accepted. This was poor security. In ANSI C and C++ one uses function prototypes in order to declare the correct number of parameters for the protocol. The protocol allows the compiler to check that no mistakes have been made. The result is a more secure, reliable program.

Another example of a protocol is the FTP protocol (see man ftpd) which consists of a stream of commands ended by end-of-line, as in a shell. Indeed, all shell commands interfaces are implicit protocols.

How much checking should be specified in a protocol? How strict should it be? If a protocol is checked very stringently, there are few chances of its being abused. If we do not check its integrity (e.g. number and type of parameters) then the protocol can be abused.
Protocols exact a discipline which must be upheld by the software. This requires a discipline from programmers also. The advantage of using protocols, however is that (if they are well-designed) they can be checked for consistency.

The function protocol ends by returning a value. The return value often denotes
success or failure. These values should be checked! Programs should fail safely!
In Unix, most system calls return the value -1 if they fail. One can then use perror() or strerr() to determine the reason for the failure. e.g.
if (chmod("/dir/newfile",0600) == -1)
   {
   perror("chmod");
   return error;
   }

Ambiguity is our enemy

Whenever input or output is ambiguous, there is a security problem.

PATH attacks

When one program starts another program, the exact path of the intended program should be given. Some programs rely on the PATH variable to search for a program, but this allows users to insert a fake program into the PATH list which will be executed instead. e.g.
#!/usr/bin/perl
# 

# ....

system("ls");  # should be system("/bin/ls");
The same thing applies to popen() or any command which uses a shell. Functions for executing other programs (child processes) can be classified into two types: those which invoke a shell in order to run a program and those which do not. Secure child processes avoid using shells.

Even if we avoid using PATH, we are not home and dry. Another example is the so-called IFS attack against the Bourne shell. The IFS variable determines the characters which are to be interpreted as whitespace. It stands for Internal Field Separators. Suppose we set this to include the forward slash character: (Bourne Shell og Bash)

IFS="/ \t\n"; export IFS
PATH=".:$PATH"; export PATH
Now call any program which uses an absolute PATH from a Bourne shell (e.g. system(), or popen() system calls). This is now interpreted like this

system("/bin/mail root");   --->  system(" bin mail root");

which would attempt to execute a commands called bin in the current directory of the user. It is then trivial to introduce a Trojan horse called bin into some directory like /tmp where a not-very-bright system admin tests things. The IFS bug has pretty much been disallowed in shells now.
This is why root's PATH should never include '.', i.e. the current directory
It should be restricted to a few trusted directories.

File permissions

When we create new files, we must be careful about what permissions they will get. New file permissions are determined by combining the value of umask with the value impleed by the program. A value can be set explicitly, for instance:
chmod ("/dir/myfile",0644)
Programs inherit the umask from the shell which starts them. This can interfere with system calls which create files. It can do strange things to file permissions. For instance, look at this C code (see exercise this week).

umask(0);

if ((fp = fopen("newfile","w")) == NULL)
   {
   perror("fopen");
   return error;
   }

/* What permissions does newfile have now? */

chmod("newfile",0755);

/* What permissions does newfile have now? */

fclose(fp);
Here is what happens. The first time we run, the file does not exist: The file is created with mode 666 (read/write to everyone). The chmod() call sets the value to 755. The next time we run it, the file already exists and it inherits the previous permission 755 without question. Now suppose we change the value of umask to 077 above:
umask(077);
If we delete the file and start again, the file is created with mode 0600 and it is them altered to 0755.

We see how the inheritance of attributes can be a dangerous problem. The permissions on the file are ambigous unless we explicitly set them. A bug in Solaris/Unix startup files before Solaris 2.6 made server processes like ftpd run with a zero umask variable. That meant that all private files downloaded with ftp became writable to everyone, mode 666!

Buffer overflows

Buffer handling of input streams is an extremely pernicious problem. If you don't get it right from the beginning, using a secure standard, it will come back to haunt you again and again. (Many many programmers, including me!, have been burned by this.) The same applies whenever we manipulate strings in memory in any way.

Trusting input

Here is an example from an early version of Mosaic. In order to handle telnet sessions, the authors simply wrote
sprintf(buf,"telnet %s",url);
system(buf);
This did not respond well to URLs of the form
telnet://host.example.com;rm -rf *
Even a check to block the semi-colon is not enough, since system() uses a shell expressions are evaluated:
telnet://host.example.com&&rm -rf *
This applies generally to programs which run other programs as child processes: e.g. when programs mail warnings to users:
sprintf (command,"/usr/bin/mail %s",mailaddress);
system(command);
If we set mailaddress to someone;rm -rf * then we have the same problem again. Recall also the the buffer overflow examples from the exercises. The only way around kind of problem is to restrict the privilege of a process so that it cannot cause damage. We shall be looking at that in the next lecture.

Similar problems can be used to attack web servers where arbitrary server-side commands can be executed, or in poorly written CGI programs. We should never trust input if we are going to do anything important with it.

Functions with intrinsic problems

gets()

The gets() function in the C library gets a string from standard input and places the result in a buffer:
char buffer[1024];

gets(buffer);
There is no bounds checking in this function. The user can type as much as he/she likes and the buffer will overflow!

The scanf() family

The scanf() family of functions (scanf,fscanf and sscanf) is a fantastic set of tools of the C library, with truly powerful features. Like all powerful tools, it can result in powerful blunders. The syntax is:
int i;
long L;
char ch;
char buffer[1024];

scanf ("%d %ld %c %s",&i,&L,&ch,buffer);  /* for instance */
Programmers have to remember that scanf takes pointers to variables it reads in, not the variables themselves. No checking is done on this, so even the smallest error can result in a memory pointer error. On modern operating systems with proper memory protection (Unix/NT) mixing up a pointer with the contents of the variable should normally result in a fatal error, with no other damage done. In DOS, MacIntosh etc, it can cause corruption without a fatal error!

In BSD 4.3 and possibly other early systems there is a number of bugs in the implementation of scanf which makes it intrinsically dangerous. This does not apply to modern implementations, but there are still many pitfalls and problems. The scanf functions are very powerful tools, and all powerful tools can be used to make powerful mistakes. Programs should wear protective clothing when using powerful tools.

Let's look at some examples of common mistakes: Here is an example of a coding error which the compiler cannot detect. It can lead to a stack overflow or the corruption of another local variable depending on the byte order of the host, by reading four to eight bytes into a single byte container:

char ch;

scanf("%ld",&ch);   /* Non detectable error by compiler */
There is no way to check for this kind of error, except to be very careful. That makes it a problem, since humans are not good at spotting errors.

We need to remember the & pointer symbol.

int i;

scanf("%d",&i);    /* Correct */

scanf("%d",i);     /* Wrong! */
In the latter case, scanf trusts the value of i to be a pointer and attempts to use its value to store the input.

When reading strings with scanf, we need to set limits on how much will be read into a buffer, to avoid overflows. This is possible with scanf, but it is not easily accomplished with the C++ stream library. e.g.


char buffer[1024];

scanf("%1023s",buffer);

In order to avoid compound stream errors, we need to clear bad data from the input stream (see the exercise last week).
int var = stupidvalue;

scanf("%d",&var)

if (var == stupidvalue)
   {
   /* recover */
   }
else
   {
   }

If scanf fails to find an object to match the request, nothing is read from the input stream. In that case it might be appropriate to do something like, throw away all input up to the end of the next line:
int var = stupidvalue;

scanf("%d",&var)

if (var == stupidvalue)
   {
   while (fgetc(stdin) != '\n') // purge
      {
      }
   }
else
   {
   }


Hidden dependency

There are few programs today which do not rely on secure libraries. Virtually every Unix program depends on libc.so and virtually every windows program relies on DLL's. If these libraries become exchanged with Trojan horses many things can go wrong.

Unix looks for shared libraries in directories contained in the environment variable LD_LIBRARY_PATH. Suppose a user sets this variable to include a directory which is publically writable:

setenv LD_LIBRARY_PATH /tmp:$LD_LIBRARY_PATH
or the user's current directory
setenv  LD_LIBRARY_PATH .:$LD_LIBRARY_PATH
It is now trivial for an attacker to install a Trojan horse in place of a crucial library. If this is the superuser's account, the system is in deep trouble! Windows does not secure its DLLs, so they can be replaced with Trojan horses trivially.
Secure applications can be linked statically with a trusted library to avoid this.


Thought for the week

String handling has replaced console input in nearly all applications which are written today, since GUI environments do not support console abstractions for dialogue boxes. String handling is one of the most useful things you can learn!

Back