7.6. Metamorphic VirusesVirus writers still must often waste weeks or months to create a new polymorphic virus that does not have chance to appear in the wild because of its bugs. On the other hand, a researcher might be able to deal with the detection of such a virus in a few minutes or few days. One of the reasons for this is that there are a surprisingly low number of efficient external polymorphic engines.20. 7.6.1. What Is a Metamorphic Virus?Igor Muttik explained metamorphic viruses in the shortest possible way: "Metamorphics are body-polymorphics." Metamorphic viruses do not have a decryptor or a constant virus body but are able to create new generations that look different. They do not use a data area filled with string constants but have one single-code body that carries data as code.21. Metamorphic computer viruses have the ability to change their shape by themselves from one form to another, but they usually avoid generating instances that are very close to their parent shape.Figure 7.4 illustrates the problem of metamorphic virus bodies as multiple shapes. Figure 7.4. The virus body keeps changing in different generations of a metamorphic virus.![]() 7.6.2. Simple Metamorphic VirusesIn December of 1998, Vecna (a notorious virus writer) created the W95/Regswap virus. Regswap implements metamorphosis via register usage exchange. Any part of the virus body will use different registers but the same code. The complexity of this, clearly, is not very high. Listing 7.10 shows some sample code fragments selected from two different generations of W95/Regswap that use different registers. Listing 7.10. Two Different Generations of W95/RegswapThe bold areas show the common areas of the two code generations. Thus a wildcard string could be useful in detecting the virus. Moreover, support for half-byte wildcard (indicated with the ? mark) bytes such as 5? B? (as described by Frans Veldman) could lead to even more accurate detection. Using the 5?B? wildcard pattern we can detect snippets such as 5ABF, 58BB, and so on.Depending on the actual ability of the scanning engine, however, such a virus might need an algorithmic detection because of the missing support of wildcard search strings. If algorithmic detection is not supported as a single database update, the product update might not come out for several weeksor monthsfor all platforms!Other virus writers tried to re-create older permutation techniques. For instance, the W32/Ghost virus has the capability to reorder its subroutines similarly to the BadBoy DOS virus family (see Figure 7.5). Badboy always starts in the entry point (EP) of the virus. Figure 7.5. The Badboy virus uses eight modules.![]() 7.6.3. More Complex Metamorphic Viruses and Permutation TechniquesThe W32/Evol virus appeared in July of 2000. The virus implements a metamorphic engine and can run on any major Win32 platform. In Listing 7.11, section a. shows a sample code fragment, mutated in b. to a new form in a new generation of the same virus. Even the "magic" DWORD values (5500000Fh, 5151EC8Bh) are changed in subsequent generations of the virus, as shown in c. Therefore any wildcard strings based on these values will not detect anything above the third generation of the virus. W32/Evol's engine is capable of inserting garbage between core instructions. Listing 7.11. Different Generations of the W32/Evol VirusVariants of the W95/Zperm family appeared in June and September of 2000. The method used is known from the Ply DOS virus. The virus inserts jump instructions into its code. The jumps will be inserted to point to a new instruction of the virus. The virus body is built in a 64K buffer that is originally filled with zeros. The virus does not use any decryption. In fact, it will not regenerate a constant virus body anywhere. Instead, it creates new mutations by the removal and addition of jump instructions and garbage instructions. Thus there is no way to detect the virus with search strings in the files or in the memory.Most polymorphic viruses decrypt themselves to a single constant virus body in memory. Metamorphic viruses, however, do not. Therefore the detection of the virus code in memory needs to be algorithmic because the virus body does not become constant even there. Figure 7.6 explains the code structure changes of Zperm-like viruses. Figure 7.6. The Zperm virus.[View full size image] ![]() Figure 7.7. Snapshot of a decrypted "polymorphic virus."![]() 7.6.4. Mutating Other Applications: The Ultimate Virus Generator?Not only does W95/Bistro itself mutate in new generations, it also mutates the code of its host by a randomly executed code-morphing routine. In this way, the virus might generate new worms and viruses. Moreover, the virus cannot be perfectly repaired because the entry-point code area of the application could be different. The code sequence at the entry point of the host application is mutated for a range 480 bytes long. The next listing shows an original and a permutated code sequence of a typical entry point code:Original entry-point code: Permutated entry-point code: Thus an instruction such as test esi, esi can be replaced by or esi, esi, its equivalent format. A push ebp; mov ebp, esp sequence (very common in high-level language applications) can be permutated to push ebp; push esp, pop ebp. It would certainly be more complicated to replace the code with different opcode sizes, but it would be possible to shorten longer forms of some of the complex instructions and include do-nothing code as a filler. This is a problem for all scanners.If a virus or a 32-bit worm were to implement a similar morphing technique, the problem could be major. New mutations of old viruses and worms would be morphed endlessly! Thus, a virtually endless number of not-yet-detectable viruses and worms would appear without any human intervention, leading to the ultimate virus generator.NoteAn even more advanced technique was developed in the W95/Zmist virus22, which is described in the following section.At the end of 1999, the W32/Smorph Trojan was developed. It implements a semimetamorphic technique to install a backdoor in the system. The standalone executable is completely regenerated during the installation of the Trojan. The PE header is re-created and will include new section names and section sizes. The actual code at the entry point is metamorphically generated. The code allocates memory and then decrypts its own resources, which contain a set of other executables. The Trojan uses API calls to its own import address table, which is filled with many nonessential API imports, as well as some essential ones. Thus everything in the standalone Trojan code will be different in new generations. 7.6.5. Advanced Metamorphic Viruses: ZmistDuring Virus Bulletin 2000, Dave Chess and Steve White of IBM demonstrated their research results on undetectable viruses. Shortly after, the Russian virus writer, Zombie, released his Total Zombification magazine, with a set of articles and viruses of his own. One of the articles in the magazine was titled "Undetectable Virus Technology."Zombie has already demonstrated his set of polymorphic and metamorphic virus-writing skills. His viruses have been distributed for years in source format, and other virus writers have modified them to create new variants. Certainly this is the case with Zombie's "masterpiece" creation, W95/Zmist.Many of us have not seen a virus approach this complexity for some time. We could easily call Zmist one of the most complex binary viruses ever written. W95/SK, One_Half, ACG, and a few other virus names popped into our minds for comparison. Zmist is a little bit of everything: It is an entry-point obscuring (EPO) virus that is metamorphic. Moreover, the virus randomly uses an additional polymorphic decryptor.The virus supports a unique new technique: code integration. The Mistfall engine contained in the virus is capable of decompiling PE files to their smallest elements, requiring 32MB of memory. Zmist will insert itself into the code; it moves code blocks out of the way, inserts itself, regenerates code and data references (including relocation information), and rebuilds the executable. This is something that has never been seen in any previous virus.Zmist occasionally inserts jump instructions after every single instruction of the code section, each of which points to the next instruction. Amazingly, these horribly modified applications will still run as before, just as the infected executables do, from generation to generation. In fact, we have not seen a single crash during test replications. Nobody expected this to worknot even the virus's author Zombie. Although it is not foolproof, it seems to be good enough for a virus. It takes some time for a human to find the virus in infected files. Because of this extreme camouflage, Zmist is easily the perfect antiheuristic virus.They say a good picture is worth a thousand words. The T-1000 model from the film Terminator 2 is the easiest analogy to use. Zmist integrates itself into the code section of the infected application as the T-1000 model could hide itself on the floor. 7.6.5.1 InitializationZmist does not alter the entry point of the host. Instead, it merges with the existing code, becoming part of the instruction flow. However, the code's random location means that sometimes the virus will never receive control. If the virus does run, it will immediately launch the host as a separate process and hide the original process (if the RegisterServiceProcess() function is supported on the current platform) until the infection routine completes. Meanwhile, the virus will begin searching for files to infect. 7.6.5.2 Direct Action InfectionAfter launching the host process, the virus will check whether there are at least 16MB of physical memory installed. The virus also checks that it is not running in console mode. If these checks pass, it will allocate several memory blocks (including a 32MB area for the Mistfall workspace), permutate the virus body, and begin a recursive search for PE files. This search will take place in the Windows directory and all subdirectories, the directories referred to by the PATH environment variable, and then all fixed or remote drives from A: to Z:. This is a rather brute-force approach to spreading. 7.6.5.3 PermutationThe permutation is fairly slow because it is done only once per infection of a machine. It consists of instruction replacement, such as the reversing of branch conditions, register moves replaced by push/pop sequences, alternative opcode encoding, XOR/SUB and OR/TEST interchanging, and garbage instruction generation. It is the same engine, RPME, that is used in several viruses, including W95/Zperm, which also was written by Zombie. 7.6.5.4 Infection of Portable Executable FilesA file is considered infectable if it meets the following conditions:
The virus will read the entire file into memory and then choose one of three possible infection types. With a one-in-ten chance, only jump instructions will be inserted between every existing instruction (if the instruction was not a jump already), and the file will not be infected. There is also a one in ten chance that the file will be infected by an unencrypted copy of the virus. Otherwise, the file will be infected by a polymorphically encrypted copy of the virus. The infection process is protected by structured exception handling, which prevents crashes if errors occur. After the rebuilding of the executable, the original file is deleted, and the infected file is created in its place. However, if an error occurs during the file creation, the original file is lost, and nothing will replace it.The polymorphic decryptor consists of "islands" of code that are integrated into random locations throughout the host code section and linked by jumps. The decryptor integration is performed in the same way as for the virus body integrationexisting instructions are moved to either side, and a block of code is placed between them. The polymorphic decryptor uses absolute references to the data section, but the Mistfall engine will update the relocation information for these references, too. An antiheuristic trick is used for decrypting the virus code: Instead of making the section writeable to alter its code directly, the host is required to have, as one of the first three sections, a section containing writeable, initialized data. The virtual size of this section is increased by 32KB, large enough for the decrypted body and all variables used during decryption. This allows the virus to decrypt code directly into the data section and transfer control there.If such a section cannot be found, the virus will infect the file without using encryption. The decryptor receives control in one of four ways:
If one of the first three methods is used, the transfer of control will appear soon after the entry point. In the case of the last method, though, an island of the decryptor is simply inserted into the middle of a subroutine somewhere in the code (including before the entry point). All used registers are preserved before decryption and restored afterward, so the original code will behave as before. Zombie calls this last method UEP, perhaps an acronym for "unknown entry point" because there is no direct pointer anywhere in the file to the decryptor.When encryption is used, the code is encrypted with ADD, SUB, or XOR with a random key, which is altered on each iteration by ADD/SUB/XOR with a second random key. Between the decryption instructions are various garbage instructions. They use a random number of registers and a random choice of loop instruction, all produced by the executable trash generator (ETG) engine, which was also written by Zombie. Randomness features heavily in this virus. 7.6.5.5 Code IntegrationThe integration algorithm requires that the host have fixups to distinguish between offsets and constants. After infection, however, the fixup data is not required by the virus. Therefore, though it is tempting to look for a gap of about 20KB in the fixup area (which would suggest that the virus body is located there), it would be dangerous to rely on this during scanning.If another application (such as one of an increasing number of viruses) were to remove the fixup data, the infection would be hidden. The algorithm also requires that the name of each section in the host be one of the following: CODE, DATA, AUTO, BSS, TLS, .bss, .tls, .CRT, .INIT, .text, .data, .rsrc, .reloc, .idata, .rdata, .edata, .debug, or DGROUP. These section names are produced by the most common compilers and assemblers in use: those of Microsoft, Borland, and Watcom. The names are not visible in the virus code because the strings are encrypted.A block of memory is allocated that is equivalent to the size of the host memory image, and each section is loaded into this array at the section's relative virtual address. The location of every interesting virtual address is noted (import and export functions, resources, fixup destinations, and the entry point), and then the instruction parsing begins.This is used to rebuild the executable. When an instruction is inserted into the code, all following code and data references must be updated. Some of these references might be branch destinations, and in some cases the size of these branches will increase as a result of the modification. When this occurs, more code and data references must be updated, some of which might be branch destinations, and the cycle repeats. Fortunatelyat least from Zombie's point of viewthis regression is not infinite; although a significant number of changes might be required, the number is limited. The instruction parsing consists of identifying the type and length of each instruction. Flags are used to describe the types, such as instruction is an absolute offset requiring a fixup entry, instruction is a code reference, and so on.In some cases, an instruction cannot be resolved in an unambiguous manner to either code or data. In that case, Zmist will not infect the file. After the parsing stage is completed, the mutation engine is called, which inserts the jump instructions after every instruction or generates a decryptor and inserts the islands into the file. Then the file is rebuilt, the relocation information is updated, the offsets are recalculated, and the file checksum is restored. If overlay data is appended to the original file, then it is copied to the new file too. 7.6.6. { W32, Linux} /Simile: A Metamorphic Engine Across SystemsW32/Simile is the latest "product" of the developments in metamorphic virus code. The virus was released in the 29A #6 issue in early March 2002. It was written by the virus writer who calls himself The Mental Driller. Some of his previous viruses, such as W95/Drill (which used the Tuareg polymorphic engine), were already very challenging to detect.W32/Simile moves yet another step in complexity. The source code of the virus is approximately 14,000 lines of Assembly code. About 90% of the virus code is spent on the metamorphic engine itself, which is extremely powerful. The virus's author has called it MetaPHOR, which stands for "metamorphic permutating high-obfuscating reassembler."The first-generation virus code is about 32KB, and there are three known variants of the virus in circulation. Samples of the variant initially released in the 29A issue have been received by certain AV companies from major corporations in Spain, suggesting a minor outbreak.W32/Simile is very obfuscated and challenging to understand. The virus attacks disassembling, debugging, and emulation techniques, as well as standard evaluating-based virus-analysis techniques. As with many other complex viruses, Simile uses EPO techniques. 7.6.6.1 Replication RoutineSimile contains a fairly basic direct-action replication mechanism that attacks PE files on the local machine and the network. The emphasis is clearly on the metamorphic engine, which is unusually complex. 7.6.6.2 EPO MechanismThe virus searches for and replaces all possible patterns of certain call instructions (those that reference ExitProcess() API calls) to point to the beginning of the virus code. Thus the main entry point is not altered. The metamorphic virus body is sometimes placed, together with a polymorphic decryptor, into the same location of the file. In other cases, the polymorphic decryptor is placed at the end of the code section, and the virus body is not placed there but in another section. This is to further obfuscate the location of the virus body. 7.6.6.3 Polymorphic DecryptorDuring the execution of an infected program, when the instruction flow reaches one of the hooks that the virus places in the code section, control is transferred to a polymorphic decryptor responsible for decoding the virus body (or simply copying it directly, given that the virus body is intentionally not always encrypted).This decryptor, whose location in the file is variable, first allocates a large chunk of memory (about 3.5MB) and then proceeds to decipher the encrypted body into it. It does this in a most unusual manner: Rather than going linearly through the encrypted data, it processes it in a seemingly random order, thus avoiding the triggering of some decryption-loop recognition heuristics. This "pseudo-random index decryption" (as the virus writer calls it) relies on the use of a family of functions that have interesting arithmetic properties, modulo 2^n. Although the virus writer discovered this by trial and error, it is in fact possible to give a mathematical proof that his algorithm will work in all cases (provided the implementation is correct, of course). This proof was made by Frederic Perriot at Symantec.The size and appearance of the decryptor varies greatly from one virus sample to the next. To achieve this high variability, the virus writer simply generates a code template and then puts his metamorphic engine to work to transform the template into a working decryptor.Additionally, in some cases the decryptor might start with a header whose intent is not immediately obvious upon reading it. Further study reveals that its purpose is to generate antiemulation code on the fly. The virus constructs a small oligomorphic code snippet containing the instruction RDTSC (ReaD Time Stamp Counter), which retrieves the current value of an internal processor ticks counter. Then, based on one random bit of this value, the decryptor either decodes and executes the virus body or bypasses the decryption logic altogether and simply exits.Besides confusing emulators that would not support the somewhat peculiar RDTSC instruction (one of Mental Driller's favorites, already present in W95/Drill), this is also a very strong attack against all algorithms relying on emulation either to decrypt the virus body or to determine viral behavior heuristicallybecause it effectively causes some virus samples to cease infecting completely upon a random time condition.When the body of the virus is first executed, it retrieves the addresses of 20 APIs that it requires for replication and for displaying the payload. Then it will check the system date to see whether either of its payloads should activate. Both payloads require that the host import functions from USER32.DLL. If this is the case, the virus checks whether or not it should call the payload routine (which is explained further on). 7.6.6.4 MetamorphismAfter the payload check has been completed, a new virus body is generated. This code generation is done in several steps:
Not only can Simile expand as most first-generation metamorphic viruses did, it can also shrink (and shrink to different forms). Simile.D is capable of translating itself to different metamorphic forms (V1, V2…Vn) and does so on more than one operating system (O1,O2…On). So far, the virus does not use multiple CPU forms, but it could also introduce code translation and metamorphism for different processors (P1..Pn) in the future, as shown in Figure 7.8. Figure 7.8. Stages of Simile from Linux to Windows and from one generation to another.![]() 7.6.6.5 ReplicationThe replication phase begins next. It begins by searching for *.exe in the current directory and then on all fixed and mapped network drives. The infection will scan recursively into directories, but only to a depth of three subdirectories, avoiding completely any directory that begins with the letter W. For each file that is found, there is a 50% chance that it will be skipped. Additionally, files will be skipped if they begin with F-, PA, SC, DR, or NO or contain the letter V anywhere in the name. Due to the nature of the comparison, other character combinations are unintentionally skipped, such as any directory that begins with the number 7, any file that begins with FM, or any file that contains the number 6 anywhere in the name.The file-infection routine contains many checks to filter files that cannot be infected safely. These checks include such conditions as the file must contain a checksum, it must be an executable for the Intel 386+ platform, and there must exist sections whose names are .text or CODE and .data or DATA. The virus also checks that the host imports some kernel functions, such as ExitProcess.For any file that is considered infectable, random factors and the file structure will determine where the virus places its decryptor and virus body. If the file contains no relocations the virus body will be appended to the last section in the file. (Apparently, this can happen randomly with a small chance as well, regardless if there are any relocations in the file.)In this case, the decryptor will be placed either immediately before the virus body or at the end of the code section. Otherwise, if the name of the last section is .reloc, the virus will insert itself at the beginning of the data section, move all of the following data, and update all of the offsets in the file. 7.6.6.6 PayloadThe first payload activates only during March, June, September, or December. Variants A and B of W32/Simile display their messages on the 17th of these months. Variant C will display its message on the 18th of these months. Variant A will display "Metaphor v1 by The Mental Driller/29A," and variant B will display "Metaphor 1b by The Mental Driller/29A." Variant C intends to display "Deutsche Telekom by Energy 2002 **g**"; however, the author of that variant had little understanding of the code, and the message rarely appears correctly. In all cases, the cases of the letters are mixed randomly, as shown in Figure 7.9. Figure 7.9. The "metamorphic" activation routines of the Simile virus.a.) Variant A ![]() ![]() ![]() ![]() Figure 7.10. The payload of the Simile.D virus on Linux.[View full size image] ![]() 7.6.7. The Dark FutureMSIL Metamorphic VirusesAs discussed in this chapter, some new MSIL viruses, such as MSIL/Gastropod, already support semi-metamorphic (permutating) code generation under the .NET Framework. Such viruses have a great advantage because they do not need to carry their own source code. Instead, viruses can simply decompile themselves to generate new binaries, for example, by using the System.Reflection.Emit namespace. Listing 7.12 illustrates the MSIL metamorphic engine of the MSIL/Gastropod virus in two generations. Listing 7.12. Different Generations of the MSIL/Gastropod VirusThe extracted snippets represent the class constructor (".cctor") function of MSIL/Gastropod in two different generations. The semi-metamorphic engine of the virus obfuscates class names and method names23. In addition, the permutation engine also inserts and removes junk instructions (such as nop) into its body. Indeed, it is not well known to MSIL developers that they can invoke a compiler and generate code and assembly from a running application, but virus writers already use this feature.MSIL/Gastropod is a code-builder virus: It reconstructs its own host program with itself. This method allows the virus body to be placed to an unpredictable position within the host program. The main entry-point method of the host program is replaced by the entry-point method of the virus. When the infected host is executed, the virus code is invoked, which will run the original main entry-point method eventually. ![]() |