|
The following is an analysis of a worst case virulence for a computer
worm, using existing mechanisms and a modified infection strategy. Such
a "Warhol Worm" could infect every vulnerable machine in a 15 minute time
period, outpacing human defense. It is important to understand the possible
threat, in order to develop better defenses.
"In the future, everybody will have 15 minutes of fame"
-Andy Warhol
The recent outbreak of the Code Red active worm has demonstrated how
vulnerable our infrastructure is. But the worm could have been a thousand
times worse. It could have contained a malicious payload: corrupting data,
reflashing BIOSes, and potentially destroying machines. It could have
included attacks for different servers, or a secondary email component,
to increase its reach.
But although it was fast, the 12 hours it took to reach epidemic levels
still allows for an organized response. But by simply changing the infection
pattern, it is possible for a malicious programmer to build a "Warhol
Worm", able to attack all vulnerable machines, worldwide, in 15 minutes.
A reactive, human defense would fail before such an onslaught. It is an
important exercise to realize just how vulnerable we are.
An active worm such as Code Red or the original Morris worm takes advantage
of a security hole in a server. It scans through the Internet, looking
for machines running that service. Then it tries to break into that service.
If successful, it infects the target machine with another copy of itself.
Over a period of several hours, it goes from an initial machine to Internet
wide contamination. The difference between an active worm and a Warhol
worm is minor, just a different strategy of choosing hosts to infect.
There are 4 billion Internet addresses, and let us assume that 1 million
machines are vulnerable a particular Warhol Worm. Assume that the worm
is under 100k in size and is multithreaded, allowing it to probe and attack
many machines simultaneously. Further assume that the targeted servers
have good network connections, allowing a running copy of the worm to
scan 100 machines per second, probe and infect 10 machines per second,
and it takes 1 second to actually transfer over the worm to a new target.
Finally, assume that a Warhol Worm author has planned ahead and quietly
constructed a list of 50,000 computers with good network connections and
a running copy of the targeted server, of which an unknown 25% are actually
infectible by the worm.
The worm starts out on a single machine, with its list of 50,000 targets.
The worm goes through the list in order, and probes and infects the machines.
When it successfully takes over another machine, it divides the list in
half, sends half over to the newly infected machine and keeps the other
half. This quick division of labor allows the worm to infect every vulnerable
machine on the list in under a minute.
Now roughly 12,000 machines will be infected, and the second stage begins.
The worm first attempts to infect all the hosts on its subnet, before
beginning to choose new targets in the general Internet. But instead of
just picking random machines, or scanning through the net in a linear
order, the worms use a pseudo-random order.
A Warhol worm contains a generator for a pseudo-random permutation of
all 4 billion Internet addresses. Each worm infected during the hitlist
phase starts at its address and goes through the permutation, looking
for new targets to infect, while each worm infected during the second
phase starts at a random location. If it finds another copy of itself
running, it picks a new, random address and starts from there. This will
have the worm behave with both the features of a random probe (scattering
through the net) and a sequential probe (minimizing duplication of effort
and guaranteeing that the entire Internet will be efficiently scanned).
Even if the worms did not infect any other machines during this phase,
they would take about 50-100 hours to scan the entire internet at 100
scans/machine/second. If the computers on the hit list are chosen for
speed and connectivity, this would be significantly lower.
But with 1 million vulnerable machines, any given scan and probe has
a .025% chance of being a vulnerable machine. Thus, with the 1.2 million
scans per second the initial worms send out, 300 will reveal new targets.
By the second minute after release, the worm will have infected a total
of 30,000 machines. After the third minute, there will be over 70,000
infected machines. It becomes quite obvious that complete infection will
be achieved within the 15 minute timeframe.
Normally, once a worm which uses random probes infects about 1/2 of the
available hosts, the rate of new infections slows down considerably. A
fully coordinated worm, where the tasks of scanning the internet are perfectly
divided, will only slow down once every target is infected. The pseudo
random/random combination is a compromise, allowing the worm to do a comprehensive
scan of the internet without relying on transmitting information between
worms, just the ability to check to see if a potential target is already
infected.
The Internet will slow down during this process, but not as much as may
be expected. With only a few bytes to scan a target to see if a server
is running, roughly 5k to infect a target, and 100k to transfer on successful
infection, the amount of data involved is surprisingly small: Even at
the peak of infection with a million infested machines, this represents
only a hundred million scans per second worldwide, a network load not
significantly higher than Code Red employed.
There is no current defense against a malicious Warhol worm. The author
of a Warhol worm could easily cause hundreds of billions in real damage,
through corrupted data, destroyed machines, and lost time. By the time
the world realizes what is happening, all the damage would be done.
Appendix 1: Justification of assumptions
100k worm size: 100 kilobytes of data is a reasonable size for
a worm: It is sufficient for the infection code and a small but effective
malicious payload. With some patience and cleverness, the size could probably
be reduced to 50k or less.
100 scans/second: Scanning a single machine to see if it is running
the vulnerable service requires only about a kilobit of data to be sent
and received, this only requires about a 100kbps link for each active
worm. Since server machines targeted by active worms tend to have good
network, this is easily achievable. Many machines should be capable of
1000 scans per second.
10 attacks/second: 10 attacks per second requires transfering
the whole worm over. For servers, 1 megabyte/second of data is a little
high for some but many machines do have that level of bandwidth. A lower
limit of one attack per second would still be sufficient, as it would
only slow the initial phase of expansion, perhaps to slightly more than
a minute. During the second phase, infection rates are limited by the
rate of probing, not the rate of attacking.
1 second to infect a machine: Taking over and infecting a machine
requires that the probe be successful, the worm transfered over, and the
worm to start running. Since the worm is only 100k, this can easily be
accomplished in a second for machines connected through broadband links.
Note that a latency of several seconds will not significantly slow down
the propagation during the permutation scanning phase, but only slows
down the hitlist phase.
1M vulnerable hosts: Code Red I and II seem to have infected somewhere
between 500,000 and 1,000,000 machines on the net. A fewer number of potential
hosts would slow down the rate of infection slightly, but even with only
500,000 vulnerable machines, the spread would still take roughly 15 minutes:
it would double every 2 minutes instead of every minute, until all vulnerable
machines are infected.
Initial hit list scan: The initial hit-list scan is a slow scan
which is done in advance, just to determine a set of potentially vulnerable
machines. In the case of a web server, a simple web-crawler could be used.
For a different service, another form of scanner. Such scans, since they
only are used to detect the program running, not the existence of the
vulnerability, can be done weeks in advance and would not attract notice.
Much of the information may already be publicly available.
In practice, it would be very hard to detect and trace the scan used
to create such a hitlist. The Honeynet project has revealed just how many
scans occur all the time. A slow, advanced scan just to determine the
version of a web-server and its response time would be nearly undetectable
amid all the noise generated by script kiddies.
Also, the hitlist can be created long before an actual vulnerability
is discovered. The attacker simply collects a catalog of machines running
potential targeted services, and constructs the worm with the exception
of the actual exploitation code. Once an exploit is available is the final
worm compiled and released, using the premade hitlist.
Pseudo random scanning: The pseudo-random-permutation is easy
to construct: Just use a 32 bit block cipher and a key selected in advance.
To generate the Nth entry in the permutation, just encrypt N. To get the
index of a given machine K, just decrypt K.
This does result in some redundancy in scanning, as multiple worms may
probe a range in the permutation until an already infected machine is
found. However, this is a comparatively minor loss, and the rapid expansion
of the worm will cause a loss in efficiency, this is a comparatively minor
loss. Complete efficiency of scanning could be accomplished by having
the worms communicate with each other. Such communication would make such
a worm even more virulent, possibly allowing complete infection in 5 minutes.
Normally, a worm like Code Red has its infection rate start to slow as
it reaches saturation, as random probing becomes less effective. Because
this combines the effects of random and comprehensive probes, the rate
stay higher, longer.
Coordinated worms, and worms which exploit the structure of the net in
dividing the workload, would be even faster. The point of this exercise,
however, is to show that even very simple modifications to random infection
strategies can produce super virulence. Once you get a worm which propagates
in under an hour, what difference is there between 5 minutes and 15?
Appendix 2: Possible defenses
Unfortunately, Warhol worms are very fast, so human mediated defenses
fail. A transition to IPv6 would work wonders, since it eliminates the
random probing as a viable means of finding new hosts. Short of that,
more variation in applications, so individual flaws would affect fewer
hosts would help, since fewer vulnerable machines will slow the spread
of such worms.
Getting programmers to write their servers in bounds checked languages
would help, as this would eliminate roughly half of the security flaws.
But it isn't a panacea, there is nothing in a Warhol worm which requires
that the exploit used be a buffer overflow.
One factor which may slow the spread is the ability for routers to process
ARP requests. This will result in the effects of network congestion for
the worm, which will slow the spread. I have no current estimates on how
much this will affect things.
Thanks to Michael Constant for his assistance in helping design the
strategies used by such a worm.
I am currently writing an abstract simulator to model the virulence of
an abstract Warhol Worm for various parameters. I AM NOT, AND WILL NOT,
EVER WRITE SUCH A WORM! "I have no need to destroy things, just to prove
a point"
My math is deliberately sloppy when it comes to evaluating the point
where the geometric growth slows down. But the simulator confirms these
times. I will be releasing initial code for the simulator in a couple
of days, as well as data runs, but initial results say that it takes roughly
7 minutes and 30 seconds for 1M vulnerable hosts, 12 minutes and 45 seconds
if there are only 500k potential hosts in the network, as expected.
Why I'm keeping this page up!
Unfortunately, the results are too simple: Any reasonably intelligent
programmer who thinks about the problem will come up with a similar solution,
or a coordinated worm which efficiently divides the search space. Coordinated
worms are even faster but do require a level of worm to worm communication.
The point of this was to just demonstrate that coordinated worms are NOT
a prerequisite for truly fast propagation.
Secondly, the current worms, taking 12 hours to a day, are still fast
enough to foil most human reaction: Look at the continued spread of the
code red variants, roughly 3 weeks after initial infection. Changing this
to under an hour would only slightly increase the reach in the current
environment.
Finally, I am a personal believer in the doctrine of full disclosure,
especially when the "bad guys" (skilled, malicious programmers) could
easily come up with the same results without documentation. It is important
for the rest of us to consider and evaluate what the realistic threat
is. Saying nothing would help nobody.
We have always known, even before the Morris worm, that connected, homogeneous
computer networks are vulnerable to fast moving, active worms. This, however,
is an important result because it is a reminder of just HOW vulnerable
it is. Human mediated defenses can not stop a fast active worm.
I will not remove this page.
Copyright 2001 by Nicholas C Weaver, nweaver@cs.berkeley.edu. It may
be reproduced only in its entirety and with credit to its author.
The term "Warhol Worm" is an original term coined by the author.
If you mirror this article, please let me know where it is and from where
you got it. I am interested in how this article gets spread around.
|