Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources]

David Ascher, Alex Martelli, Anna Ravenscroft

نسخه متنی -صفحه : 394/ 268
نمايش فراداده

Recipe 13.8. Removing Attachments from an Email Message

Credit: Anthony Baxter

Problem

You're handling email in Python and need to remove from email messages any attachments that might be dangerous.

Solution

Regular expressions can help us identify dangerous content types and file extensions, and thus code a function to remove any potentially dangerous attachments:

ReplFormat = ""
This message contained an attachment that was stripped out.
The filename was: %(filename)s,
The original type was: %(content_type)s
(and it had additional parameters of:
%(params)s)
""
import re
BAD_CONTENT_RE = re.compile('application/(msword|msexcel)', re.I)
BAD_FILEEXT_RE = re.compile(r'(\.exe|\.zip|\.pif|\.scr|\.ps)$')
def sanitise(msg):
''' Strip out all potentially dangerous payloads from a message '''
ct = msg.get_content_type( )
fn = msg.get_filename( )
if BAD_CONTENT_RE.search(ct) or (fn and BAD_FILEEXT_RE.search(fn)):
# bad message-part, pull out info for reporting then destroy it
# present the parameters to the content-type, list of key, value
# pairs, as key=value forms joined by comma-space
params = msg.get_params( )[1:]
params = ', '.join([ '='.join(p) for p in params ])
# put informative message text as new payload
replace = ReplFormat % dict(content_type=ct, filename=fn, params=params)
msg.set_payload(replace)
# now remove parameters and set contents in content-type header
for k, v in msg.get_params( )[1:]:
msg.del_param(k)
msg.set_type('text/plain')
# Also remove headers that make no sense without content-type
del msg['Content-Transfer-Encoding']
del msg['Content-Disposition']
else:
# Now we check for any sub-parts to the message
if msg.is_multipart( ):
# Call sanitise recursively on any subparts
payload = [ sanitise(x) for x in msg.get_payload( ) ]
# Replace the payload with our list of sanitised parts
msg.set_payload(payload)
# Return the sanitised message
return msg
# Add a simple driver/example to show how to use this function
if _ _name_ _ == '_ _main_ _':
import email, sys
m = email.message_from_file(open(sys.argv[1]))
print sanitise(m)

Discussion

This issue has come up a few times on the newsgroup The email parser in Python 2.4 has been completely rewritten to be robust first, correct second. Prior to that version, the parser was written for correctness first. But focusing on correctness was a problem because many virus/worm messages and other malware routinely send email messages that are broken and nonconformantmalformed to the point that the old email parser chokes and dies. The new parser is designed to never actually break when reading a message. Instead, it tries its best to fix whatever it can fix in the message. (If you have a message that causes the parser to crash, please let us, the core Python developers, know. It's a bug, and we'll fix it. Please include a copy of the message that makes the parser crash, or else it's very unlikely that we can reproduce your problem!)

The recipe's code itself is fairly well commented and should be easy enough to follow. A mail message consists of one or more parts; each of these parts can contain nested parts. We call the sanitise function on the top-level Message object, and it calls itself recursively on the subobjects if and as needed.

The sanitise function first checks the Content-Type of the part, and if there's a filename, it also checks that filename's extension against a known-to-be-bad list. If the message part is bad, we replace the message itself with a short text description describing the now-removed part and clean out the headers that are relevant. We set this message part's Content-Type to 'text/plain' and remove other headers related to the now-removed message.

Finally, we check whether the message is a multipart message. If so, it means the message has subparts, so we recursively call the sanitise function on each of them. We then replace the payload with our list of sanitized subparts.

If you're interested in working further on this recipe, the most important extra functionality, which is easy to add with a small amount of work, might be to store the attached file in some directory (instead of destroying all suspect attachments), and give the user a link to that file. Also consider extending the check in sanitise that filters dangerous attachments to have it verify more than just the content type and file extension; other headers may be able to carry known signs of worm or virus messages.

See Also

Documentation for the standard library modules email and re in the Library Reference and Python in a Nutshell.