THE ART OF COMPUTER VIRUS RESEARCH AND DEFENSE [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

THE ART OF COMPUTER VIRUS RESEARCH AND DEFENSE [Electronic resources] - نسخه متنی

Peter Szor

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید











  • 11.7. Heuristic Analysis Using Neural Networks


    Several researchers have attempted to use neural networks to detect computer viruses. Neural networks are a sub-field of artificial intelligence26, 27, so the subject is very exciting. Difficult polymorphic EPO viruses such as Zhengxi have been detected successfully using a trained neural network28.29 and Win32 viruses30.

    One of the key problems of any heuristic is the false positive ratio. If the heuristic is too alarming, people will not use it. IBM researchers demonstrated that single-layer classifiers yield the best results with a voting system. 31.

    Figure 11.8. Single-layer classifier with threshold.

    Neural networks can easily be overtrained, which is a pitfall of the method. Overtrained networks remember the training set extremely well, but they do not work with new sample sets. In other words, they fail to detect new viruses. To eliminate this problem, multiple neural networks are trained using distinct features. In addition, a voting system is used so that more than one network must agree about a positive detection. In the first experiments, IBM used four neural networks with voting, but it turned out that the best result was achieved when five networks out of eight agreed on a positive.

    The basic idea of the training is the selection of n-grams (sequences a couple of bytes long) of the constant part of viruses that indicate an infection. The selection of n-grams for neural network training is the unique feature of IBM's solution. For example, 4-byte sequences can be used to train the network. To train the networks better, a corpus database is used to check whether the n-grams extracted from the constant virus body areas of known computer viruses appear more than a threshold T. If the threshold is exceeded, the n-gram is not used.


    • / 191