11.5. Metamorphic Virus Detection ExamplesThere is a level of metamorphosis beyond which no reasonable number of strings can be used to detect the code that it contains. At that point, other techniques must be used, such as examination of the file structure or the code stream, or analysis of the code's behavior.To detect a metamorphic virus perfectly, a detection routine must be written that can regenerate the essential instruction set of the virus body from the actual instance of the infection. Other products use shortcuts to try to solve the problem, but such shortcuts often lead to an unacceptable number of false positives. This section introduces some useful techniques. 11.5.1. Geometric DetectionGeometric detection17 is the virus-detection technique based on alterations that a virus has made to the file structure. It could also be called the shape heuristic because it is far from exact and prone to false positives. An example of a geometric detection is W95/Zmist. When this virus infects a file using its encrypted form, it increases the virtual size of the data section by at least 32KB but does not alter the section's physical size.Thus a file might be reported as being infected by W95/ZMist if the file contains a data section whose virtual size is at least 32KB larger than its physical size. However, such a file structure alteration also can be an indicator of a runtime-compressed file. File viruses often rely on a virus infection marker to detect already infected files and avoid multiple infections. Such an identifier can be useful to the scanner in combination with the other infection-induced geometric changes to the file. This makes geometric detection more reliable, but the risk of false positives only decreases; it never disappears. 11.5.2. Disassembling TechniquesTo assemble means to bring together, so to disassemble is to separate or take apart. In the context of code, to disassemble is to separate the stream into individual instructions. This is useful for detecting viruses that insert garbage instructions between their core instructions. Simple string searching cannot be used for such viruses because instructions can be quite long, and there is a possibility that a string can appear "inside" an instruction, rather than being the instruction itself. For example, suppose that one wished to search for the instruction CMP AX, "ZM." This is a common instruction in viruses, used to test whether a file is of the executable type. Its code representation is and it can be found in the stream However, when the stream is disassembled and displayed, notice that what was found is not the instruction at all: The use of a disassembler can prevent such mistakes, and if the stream were examined further when disassembled and displayed, it can be seen that the true string follows shortly after: When combined with a state machine, perhaps to record the order in which "interesting" instructions are encountered, and even when combined with an emulator, this technique presents a powerful tool that makes a comparatively easy task of detecting such viruses as W95/ZMist and the more recent W95/Puron19. (The Puron virus is based on the Lexotan engine.)Lexotan and W95/Puron execute the same instructions in the same order, with only garbage instructions and jumps inserted between the core instructions, and no garbage subroutines. This makes them easy to detect using only a disassembler and a state machine.Sample detection of W95/Puron is shown in Listing 11.12. Listing 11.12. Focusing the Scanning on "Interesting" InstructionsACG20, by comparison, is a complex metamorph that requires an emulator combined with a state machine. Sample detection is included in the next section. 11.5.3. Using Emulators for TracingEarlier in this chapter, emulation was discussed as being useful in detecting polymorphic viruses. It is very useful for working with viruses because it allows virus code to execute in an environment from which it cannot escape. Code that runs in an emulator can be examined periodically or when particular instructions are executed. For DOS viruses, INT 21h is a common instruction to intercept. If used properly, emulators are still very useful in detecting metamorphic viruses. This is explained better through the following examples. 11.5.3.1 Sample Detection of ACGListing 11.13 shows a short example code of an instance of ACG. Listing 11.13. A Sample Instance of ACGWhen the INT 21 is reached, the registers contain ah=4a and bx=1000. This is constant for one class of ACG viruses. Trapping enough similar instructions forms the basis for detection of ACG.Not surprisingly, several antivirus scanner products do not support such detection. This shows that traditional code emulation logic in older virus scanner engines might not be used "as-is" to trace code on such a level. All antivirus scanners should go in the direction of interactive scanning engine developments.An interactive scanning engine model is particularly useful in building algorithmic detections of the kind that ACG needs. 11.5.3.2 Sample Detection of EvolChapter 7 discussed the complexity of the Evol virus. Evol is a perfect example of a virus that deals with the problem of hiding constant data as variable code from generation to generation. Code tracing can be particularly useful in detecting even such a level of change. Evol builds the constant data on the stack from variable data before it passes it to the actual function or API that needs it.At a glance, it seems that emulation cannot deal with such viruses effectively. However, this is not the case. Emulators need to be used differently by allowing more flexibility to the virus researcher to control the operations of the emulator using a scanning language that can be used to write detection routines. Because viruses such as Evol often build constant data on the stack, the emulator can be instructed to run the emulation until a predefined limit of iterations and to check the content of the stack after the emulation for constant data built by the virus. The content of the stack can be very helpful in dealing with complex metamorphic viruses that often decrypt data on the stack. 11.5.3.3 Using Negative and Positive FeaturesTo speed detection, scanners can use negative detection. Unlike positive detection, which checks for a set of patterns that exist in the virus body, negative detection checks for the opposite. It is often enough to identify a set of instructions that do not appear in any instance of the actual metamorphic virus.Such negative detection can be used to stop the detection process when a common negative pattern is encountered. 11.5.3.4 Using Emulator-Based HeuristicsHeuristics have evolved much over the last decade21. Heuristic detection does not identify viruses specifically but extracts features of viruses and detects classes of computer viruses generically.22.Nowadays, it is easy to think of an almost perfect emulation of DOS, thanks to the computing speed of today's processors and the relatively simple single-threaded OS. However, it is more difficult to emulate Windows on Windows built into a scanner! Emulating multithreaded functionality without synchronization problems is a challenging task. Such a system cannot be as perfect as a DOS emulation because of the complexity of the OS. Even if we use a system like VMWARE to solve most of the challenges, many problems remain. Emulation of third-party DLLs is one problem that can arise. Such DLLs are not part of the VM and, whenever virus code relies on such an API set, the emulation of the virus will likely break.23 do not infect without an active Internet connection. What if the virus looks for ![]() |