Measuring Adaptation in Chaotic Environments

The test to measure adaptation speed in a chaotic environment is useful for measuring the ability of the filter to adapt to an environment where new, unlearned data is presented. This test is very different from the accuracy range test and is designed to measure learning speed rather than accuracy. Because of this, the classification results are used only to determine the learning speed and not to provide results.

Test Criteria

The test criteria for a chaotic test are somewhat relaxed between corpora. Since we are measuring adaptation speed, this test requires a chaotic change in message context to occur between training and testing. Most of the same criteria as in the accuracy range test are used, but we rely upon the chaotic breakdown of these criteria between the training and all test corpora.

Message continuity

The original threads for all messages must be completely different between the training corpus and each test corpus. Each corpus should preserve the original threads and message ordering for that corpus only, and the context between corpora should be very different. It is generally a good idea to use a different test subject for each corpus.

Archive window

The archive window is generally short for this test, to be determined by the tester. The window should at the very least cover the estimated learning speed of the filter.

Purge simulation

A purge simulation should be used, but with more relaxed purge thresholds that allow for stale data to remain in the dataset for prolonged periods. If purging takes place too quickly, the test will not measure relearning speed, but only learning speed.

Interleave

The original interleave of the messages (legitimate mail versus spam) should be used. If the interleave is not available, a best estimation may be used, or multiple tests can be run for each test simulation, with the three best and worst results averaged.

Corrective training delay

This delay will depend on the tester and on the specific purpose of the test. If the test is designed to measure only the filter’s ability to relearn, an immediate correction should take place. If the test is interested in real-world chaotic adaptation, use a reasonable value simulating one or more test subjects.

Performing the Test

The chaotic adaptation test consists of the following events:

Training period

The period during which the initial corpus of messages is trained into the filter.

Adaptation period

The period during which an entire set of unknown messages is presented for classification and the speed at which accurate results are learned is measured.

Corrective period

The period during which misclassified messages are presented for retraining.

Purge period

The periods during which purging of stale data is simulated.

Listing 10-2 outlines the entire process.

Listing 10-2: Process flow of a chaotic adaptation test

while messageCount < minCount or timePeriod < minPeriod  
do
present next message for training
if timeElapsed > nextPurgeInterval
then
perform purge simulation
while test corpora remaining
let messageCount = 0  
let timeStart = timestamp of first message in corpus  
foreach corpus
do  
while more messages in corpus  
do
present next message for classification  
increment messageCount
if classification is wrong
then
determine nextInsertionPoint for correction  
else
determine accuracy for previous N results
if accuracy > minThreshold
then
report messageCount
report timestamp - timeStart
if timeElapsed > nextInsertionPoint  
then
submit erroneous message for retraining

This test begins in the same way as the accuracy range test. An initial training corpus of messages is trained into the filter. Each test corpus is then tested sequentially. As the messages in the corpus are presented for classification, the accuracy for the previous N results (where N is a window size determined by the tester) is calculated. As the filter learns from its mistakes, the accuracy will gradually increase. When the accuracy of the window has met or exceeded the minimum threshold set by the tester, this signals that the corpus has been sufficiently learned and adapted to by the filter.

Once a corpus has been sufficiently adapted, the chaotic adaptation test will then begin training the next corpus, which should be contextually different from the previous one. The same measurements are taken and reported.

When the test has completed, a message count and time delta (based on the time stamps in the headers of the test messages) are reported for each chaotic transition. Either of the results, depending on which one the tester is most interested in, can then be averaged with those of the other corpora to determine the average message count or time period required for chaotic learning to take place.

Measuring Adaptation in Chaotic Environments - Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification [Electronic resources] - نسخه متنی

Jonathan A. Zdziarski

آدرس پست الکترونیک گیرنده :

آدرس پست الکترونیک فرستنده :

نام و نام خانوارگی فرستنده :

پیغام برای گیرنده ( حداکثر 250 حرف ) :

کد امنیتی را وارد نمایید

فونت

اندازه قلم

حالت نمایش