THE ART OF COMPUTER VIRUS RESEARCH AND DEFENSE [Electronic resources]

Peter Szor

نسخه متنی -صفحه : 191/ 13

1.1. Early Models of Self-Replicating Structures

Humans create new models to represent our world from different perspectives. The idea of self-replicating systems that model self-replicating structures has been around since the Hungarian-American, Neumann JE1nos (John von Neumann), suggested it in 1948²^, ³^, ⁴.⁵.

In the traditional von Neumann machine, there was no basic difference between code and data. Code was differentiated from data only when the operating system transferred control and executed the information stored there.

To create a more secure computing system, we will find that system operations that better control the differentiation of data from code are essential. However, we also will see the weaknesses of such approaches.

Modern computers can simulate nature using a variety of modeling techniques. Many computer simulations of nature manifest themselves as games. Modern computer viruses are somewhat different from these traditional nature-simulation game systems, but students of computer virus research can appreciate the utility of such games for gaining an understanding of self-replicating structures.

1.1.1. John von Neumann: Theory of Self-Reproducing Automata

Replication is an essential part of life. John von Neumann was the first to provide a model to describe nature's self-reproduction with the idea of self-building automata.

In von Neumann's vision, there were three main components in a system:

A Universal Machine
A Universal Constructor
Information on a Tape

A universal machine (Turing Machine) would read the memory tape and, using the information on the tape, it would be able to rebuild itself piece by piece using a universal constructor. The machine would not understand the processit would simply follow the information (blueprint instructions) on the memory tape. The machine would only be able to select the next proper piece from the set of all the pieces by picking them one by one until the proper piece was found. When it was found, two proper pieces would be put together according to the instructions until the machine reproduced itself completely.

If the information that was necessary to rebuild another system could be found on the tape, then the automata was able to reproduce itself. The original automata would be rebuilt (Figure 1.1), and then the newly built automata was booted, which would start the same process.

Figure 1.1. The model of a self-building machine.

A few years later, Stanislaw Ulam suggested to von Neumann to use the processes of cellular automation to describe this model. Instead of using "machine parts," states of cells were introduced. Because cells are operated in a robotic fashion according to rules ("code"), the cell is known as an automaton. The array of cells comprises the cellular automata (CA) computer architecture.⁶ developed by artificial life researchers, such as Christopher G. Langton, in 1979. Such replication loops eliminate the complexity of universal machine from the system and focus on the needs of replication.⁷ conducted research on a self-replicating, growing lunar factory. A lunar manufacturing facility (LMF) was researched, which used the theory of self-reproducing automata and existing automation technology to make a self-replicating, self-growing factory on the moon. Robert A. Freitas, Jr. and Ralph C. Merkle recently authored a book titled Kinematic Self-Replicating Machines. This book indicates a renewed scientific interest in the subject. A few years ago, Freitas introduced the term ecophagy, the theoretical consumption of the entire ecosystem by out of control, self-replicating nano-robots, and he proposed mitigation recommendations⁸.

It is also interesting to note that the theme of self-replicating machines occurs repeatedly in works of science fiction, from movies such as Terminator to novels written by such authors as Neal Stephenson and William Gibson. And of course, there are many more examples from beyond the world of science fiction, as nanotech and microelectrical mechanical systems (MEMS) engineering have become real sciences.

1.1.2. Fredkin: Reproducing Structures

Several people attempted to simplify von Neumann's model. For instance, in 1961 Edward Fredkin used a specialized cellular automaton in which all the structures could reproduce themselves and replicate using simple patterns on a grid (see ⁹:

On the table, we use the same kind of tokens.
We either have a token or no token in each possible position.
Token generations will follow each other in a finite time frame.
The environment of each token will determine whether we will have a new token in the next generation.
The environment is represented by the squares above, below, to the left, and to the right of the token (using the 5-cell-based von Neumann environment).
The state of a square in the next generation will be empty when the token has an even number of tokens in its environment.
The state of a square in the next generation will be filled with a token if it has an odd number of tokens in its environment.
It is possible to change the number of states.

Figure 1.2. Generation 1, Generation 2, and…Generation 4.

Using the rules described previously with this initial layout allows all structures to replicate. Although there are far more interesting layouts to explore, this example is the simplest possible model of self-reproducing cellular automata.

1.1.3. Conway: Game of Life

In 1970, John Horton Conway¹⁰ created one of the most interesting cellular automata systems. Just as the pioneer von Neumann did, Conway researched the interaction of simple elements under a common rule and found that this could lead to surprisingly interesting structures. Conway named his game Life. Life is based on the following rules:

There should be no initial pattern for which there is a single proof that the population can grow without limit.
There should be an initial pattern that apparently does grow without limit.
There should be simple initial patterns that work according to simple genetic law: birth, survival, and death.

¹¹.

Figure 1.3. Edwin Martin's Game of Life implementation on the Mac using "Shooter" starting structure.

It is especially interesting to see the computer animation as the game develops with the so-called "Shooter" starting structure. In a few generations, two shooter positions that appear to shoot to each other will develop on the sides of the table, as shown in Figure 1.4, and in doing so they appear to produce so-called gliders that "fly" away (see Figure 1.5) toward the lower-right corner of the table. This sequence continues endlessly, and new gliders are produced.

Figure 1.4. "Shooter" in Generation 355.

Figure 1.5. The glider moves around without changing shape.

On a two-dimensional table, each cell has two potential states: S=1 if there is one token in the cell, or S=0 if there is no token. Each cell will live according to the rules governed by the cell's environment (see Figure 1.6).

Figure 1.6. The 9-cell-based Moore environment.

The following characteristics/rules define Conway's game, Life:

Birth: If an empty cell has three (K=3) other filled cells in its environment, that particular cell will be filled in a new generation.
Survival: If a filled cell has two or three (K=2 or K=3) other filled cells in its environment, that particular cell will survive in the new generation.
Death: If a filled cell has only one or no other filled cells (K=1 or K=0) in its environment, that particular cell will die because of isolation. Further, if a cell has too many filled cells in its environmentfour, five, six, seven, or eight (K=4, 5, 6, 7, or 8), that particular cell will also die in the next generation due to overpopulation.

Conway originally believed that there were no self-replicating structures in Life. He even offered $50 to anyone who could create a starting structure that would lead to self-replication. One such structure was quickly found using computers at the artificial intelligence group of the Massachusetts Institute of Technology (MIT).

MIT students found a structure that was later nicknamed a glider. When 13 gliders meet, they create a pulsing structure. Later, in the 100^th generation, the pulsing structure suddenly "gives birth" to new gliders, which quickly "fly" away. After this point, in each 30^th subsequent generation, there will be a new glider on the table that flies away. This sequence continues endlessly. This setup is very similar to the "Shooter" structure shown in ¹² exists between computer viruses and antivirus programs. As antivirus systems have become more sophisticated, so have computer viruses. This tendency has continued over the more than 30-year history of computer viruses.

Using models along these lines, we can see how the virus population varies according to the number of computers compatible with them. When it comes to computer viruses and antiviral programs, multiple parallel games occur side by side. Viruses within an environment that consists of a large number of compatible computers will be more virulent; that is, they will spread more rapidly to many more computers. A large number of similar PCs with compatible operating systems create a homogeneous environmentfertile ground for virulence (sound familiar?).

With smaller game boards representing a smaller number of compatible computers, we will obviously see smaller outbreaks, along with relatively small virus populations.

This sort of modeling clearly explains why we find major computer virus infections on operating systems such as Windows, which represents about 95% of the current PC population around us on a huge "grid." Of course this is not to say that 5% of computer systems are not enough to cause a global epidemic of some sort.

Note

If you are fascinated by self-replicating, self-repairing, and evolving structures, visit the BioWall project, http://lslwww.epfl.ch/biowall/163.

1.1.4. Core War: The Fighting Programs

Around 1966, Robert Morris, Sr., the future National Security Agency (NSA) chief scientist, decided to create a new game environment with two of his friends, Victor Vyssotsky and Dennis Ritchie, who coded the game and called it Darwin. (Morris, Jr. was the first infamous worm writer in the history of computer viruses. His mark on computer virus history will be discussed later in the book.)¹³^,¹⁴ that discussed Core War, beginning with the May 1984 article. Figure 1.7 is a screen shot of a Core War implementation called PMARSV, written by Albert Ma, Na'ndor Sieben, Stefan Strack, and Mintardjo Wangsaw. It is interesting to watch as the little warriors fight each other within the MARS environment.

Figure 1.7. Core Wars warrior programs (Dwarf and MICE) in battle.

¹⁴.

The simplest Redcode program consists of only one MOV instruction: MOV 0,1 (in the traditional syntax). This program is named IMP, which causes the contents at relative address 0 (namely the MOV, or move, instruction itself), to be transferred to relative address 1, just one address ahead of itself. After the instruction is copied to the new location, control is given to that address, executing the instruction, which, in turn, makes a new copy of itself at a higher address, and so on. This happens naturally, as instructions are executed following a higher address. The instruction counter will be incremented after each executed instruction.

The basic core consisted of two warrior programs and 8,000 cells for instructions. Newer revisions of the game can run multiple warriors at the same time. Warrior programs are limited to a specific starting size, normally 100 instructions. Each program has a finite number of iterations; by default, this number is 80,000.

The original version of Redcode supported 10 instructions. Later revisions contain more. For example, the following 14 instructions are used in the 1994 revision, shown in Listing 1.1.

Listing 1.1. Core War Instructions in the 1994 Revision

DATA0 data
MOVA0 move
ADDA0 add
SUBA0 subtract
MULA0 multiply
DIVA0 divide
MODA0 modula
JMPA0 jump
JMZA0 jump if zero
JMNA0 jump if not zero
DJNA0 decrement, jump if not zero
CMPA0 compare
SLTA0 skip if less than
SPLA0 split execution

Let's take a look at Dewdney's Dwarf tutorial (see Listing 1.2).

Listing 1.2. Dwarf Bombing Warrior Program

;name          Dwarf
;author        A. K. Dewdney
;version       94.1
;date          April 29, 1993
;strategy      Bombs every fourth instruction.
ORG     1 ; Indicates execution begins with the second
; instruction (ORG is not actually loaded, and is
; therefore not counted as an instruction).
DAT.F   #0, #0     ; Pointer to target instruction.
ADD.AB  #4, $-1    ; Increments pointer by 4.
MOV.AB  #0, @-2    ; Bombs target instruction.
JMP.A   $-2, #0    ; Loops back two instructions.

Dwarf follows a so-called bombing strategy. The first few lines are comments indicating the name of the warrior program and its Redcode 1994 standard. Dwarf attempts to destroy its opponents by "dropping" DAT bombs into their operation paths. Because any warrior process that attempts to execute a DAT statement dies in the MARS, Dwarf will be a likely winner when it hits its opponents.

The MOV instruction is used to move information into MARS cells. (The IMP warrior explains this very clearly.) The general format of a Redcode command is of the Opcode A, B form. Thus, the command MOV.AB #0, @-2 will point to the DAT statement in Dwarf's code as a source.

The A field points to the DAT statement, as each instruction has an equivalent size of 1, and at 0, we find DAT #0, #0. Thus, MOV will copy the DAT instruction to where B points. So where does B point to now?

The B field points to DAT.F #0, #0 statement in it. Ordinarily, this would mean that the bomb would be put on top of this statement, but the @ symbol makes this an indirect pointer. In effect, the @ symbol says to use the contents of the location to where the B field points as a new pointer (destination). In this case, the B field appears to point to a value of 0 (location 0, where the DAT.F instruction is placed).

The first instruction to execute before the MOV, however, is an ADD instruction. When this ADD #4, $-1 is executed, the DAT's offset field will be incremented by four each time it is executedthe first time, it will be changed from 0 to 4, the next time from 4 to 8, and so on.

This is why, when the MOV command copies a DAT bomb, it will land four lines (locations) above the DAT statement (see Listing 1.3).

Listing 1.3. Dwarf's Code When the First Bomb Is Dropped

0          DAT.F #0, #8
1 ->    ADD.AB 4, $-1
2          MOV.AB #0, @-2 ; launcher
3          JMP.A $-2, #0
4          DAT ; Bomb 1
5          .
6          .
7          .
8          DAT ; Bomb 2
9          .

The JMP.A $-2 instruction transfers control back relative to the current offset, that is, back to the ADD instruction to run the Dwarf program "endlessly." Dwarf will continue to bomb into the core at every four locations until the pointers wrap around the core and return. (After the highest number possible for the DAT location has been reached, it will "wrap" back around past 0. For example, if the highest possible value were 10, 10+1 would be 0, and 10+4 would be 3.)

At that point, Dwarf begins to bomb over its own bombs, until the end of 80,000 cycles/iterations or until another warrior acts upon it. At any time, another warrior program might easily kill Dwarf because Dwarf stays at a constant locationso that it can avoid hitting itself with friendly fire. But in doing so, it exposes itself to attackers.

There are several common strategies in Core War, including scanning, replicating, bombing, IMP-spiral (those using the SPL instruction), and the interesting bomber variation named the vampire.

Dewdney also pointed out that programs can even steal their enemy warrior's very soul by hijacking a warrior execution flow. These are the so-called vampire warriors, which bomb JMP (JUMP) instructions into the core. By bombing with jumps, the enemy program's control can be hijacked to point to a new, predefined location where the hijacked warrior will typically execute useless code. Useless code will "burn" the cycles of the enemy warrior's execution threads, thus giving the vampire warrior an advantage.

Instead of writing computer viruses, I strongly recommend playing this harmless and interesting game. In fact, if worms fascinate you, a new version of Core War can be created to link battles in different networks and allow warrior programs to jump from one battle to another to fight new enemies on those machines. Evolving the game to be more networked allows for simulating worm-like warrior programs.