Message Header Encodings
Message header encodings are designed to support different types of character sets. Unfortunately, they too are abused by spammers to trick unaware spam filters. Ironically, this trick works on quite a few filters, primarily legacy filters without header decoding logic. One of the big things heuristic filters look at is the subject line of a message to determine whether it is spam; most humans can determine whether a message is spam nine times out of ten just by looking at the subject line. If the filter sees words like “ADV:” or “Viagra” in the subject, it knows to can the message. Statistical filters are usually much more sensible and won’t can a message just because it has a guilty-looking header. But even a statistical filter can fail if it can’t decode message headers.RFC 2047 outlines an encoding that can be used in message headers. For example, the filter might see:
Subject: =?iso-8859-1?b?SW1tZWRpYXRlIERlbGl2ZXJ5IG9mIFZpYWdyL2E=?=
but the recipient will see:
Subject: Immediate Delivery of Viagr/a
Note | The entire header doesn’t necessarily have to be encoded, and there can be multiple encodings per message header. The subject field isn’t the only field that can use this type of encoding either. Many spams have encoded To/From fields. |
When this type of encoding is used, the recipient sees only the original text that was encoded, not the encoded portion of the message. The problem is that an unaware spam filter may fail to decode the header and also may fail to see it.
Note | Unfortunately, the mere presence of encoded headers isn’t enough to determine whether a message is spam; many individuals who converse with people using a different character set (especially of a wide-character persuasion) will also be sending email with encoded headers. |
Message header encoding is fairly simple to detect and almost as easy to decode. RFC 2047 outlines the basic rules for header encoding.Message headers support two primary encoding methods, and you’ve already heard about both of them. The same algorithm described in RFC 2045 for decoding Base64 messages can be used to decode a Base64-encoded header phrase. The second encoding method is quoted-printable. The rules outlined in RFC 2045 also apply to quoted-printable encoded header phrases, except that new line characters are not permissible.There are a few other rules to be aware of with regard to this approach. For details, download RFC 2047 from http://www.ietf.org/rfc/rfc2047.txt.