Message Header Encodings - Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification [Electronic resources] - نسخه متنی

Jonathan A. Zdziarski

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
توضیحات
افزودن یادداشت جدید







Message Header Encodings



Message header encodings are designed to support different types of character sets. Unfortunately, they too are abused by spammers to trick unaware spam filters. Ironically, this trick works on quite a few filters, primarily legacy filters without header decoding logic.

One of the big things heuristic filters look at is the subject line of a message to determine whether it is spam; most humans can determine whether a message is spam nine times out of ten just by looking at the subject line. If the filter sees words like “ADV:” or “Viagra” in the subject, it knows to can the message. Statistical filters are usually much more sensible and won’t can a message just because it has a guilty-looking header. But even a statistical filter can fail if it can’t decode message headers.

RFC 2047 outlines an encoding that can be used in message headers. For example, the filter might see:

Subject: =?iso-8859-1?b?SW1tZWRpYXRlIERlbGl2ZXJ5IG9mIFZpYWdyL2E=?=

but the recipient will see:

Subject: Immediate Delivery of Viagr/a





Note

The entire header doesn’t necessarily have to be encoded, and there can be multiple encodings per message header. The subject field isn’t the only field that can use this type of encoding either. Many spams have encoded To/From fields.


When this type of encoding is used, the recipient sees only the original text that was encoded, not the encoded portion of the message. The problem is that an unaware spam filter may fail to decode the header and also may fail to see it.





Note

Unfortunately, the mere presence of encoded headers isn’t enough to determine whether a message is spam; many individuals who converse with people using a different character set (especially of a wide-character persuasion) will also be sending email with encoded headers.


Message header encoding is fairly simple to detect and almost as easy to decode. RFC 2047 outlines the basic rules for header encoding.

Message headers support two primary encoding methods, and you’ve already heard about both of them. The same algorithm described in RFC 2045 for decoding Base64 messages can be used to decode a Base64-encoded header phrase. The second encoding method is quoted-printable. The rules outlined in RFC 2045 also apply to quoted-printable encoded header phrases, except that new line characters are not permissible.

There are a few other rules to be aware of with regard to this approach. For details, download RFC 2047 from http://www.ietf.org/rfc/rfc2047.txt.

/ 151