11.7. Heuristic Analysis Using Neural Networks Several researchers have attempted to use neural networks to detect computer viruses. Neural networks are a sub-field of artificial intelligence26, 27, so the subject is very exciting. Difficult polymorphic EPO viruses such as Zhengxi have been detected successfully using a trained neural network28.29 and Win32 viruses30.One of the key problems of any heuristic is the false positive ratio. If the heuristic is too alarming, people will not use it. IBM researchers demonstrated that single-layer classifiers yield the best results with a voting system. 31.Figure 11.8. Single-layer classifier with threshold.
Neural networks can easily be overtrained, which is a pitfall of the method. Overtrained networks remember the training set extremely well, but they do not work with new sample sets. In other words, they fail to detect new viruses. To eliminate this problem, multiple neural networks are trained using distinct features. In addition, a voting system is used so that more than one network must agree about a positive detection. In the first experiments, IBM used four neural networks with voting, but it turned out that the best result was achieved when five networks out of eight agreed on a positive.The basic idea of the training is the selection of n-grams (sequences a couple of bytes long) of the constant part of viruses that indicate an infection. The selection of n-grams for neural network training is the unique feature of IBM's solution. For example, 4-byte sequences can be used to train the network. To train the networks better, a corpus database is used to check whether the n-grams extracted from the constant virus body areas of known computer viruses appear more than a threshold T. If the threshold is exceeded, the n-gram is not used. |