Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources]

David Ascher, Alex Martelli, Anna Ravenscroft

نسخه متنی -صفحه : 394/ 267
نمايش فراداده

Recipe 13.7. Unpacking a Multipart MIME Message

Credit: Matthew Cowles

Problem

You want to unpack a multipart MIME message.

Solution

The walk method of message objects generated by the email package makes this task really easy. Here is a script that uses email to solve the task posed in the "Problem":

import email.Parser
import os, sys
def main( ):
if len(sys.argv) != 2:
print "Usage: %s filename" % os.path.basename(sys.argv[0])
sys.exit(1)
mailFile = open(sys.argv[1], "rb")
p = email.Parser.Parser( )
msg = p.parse(mailFile)
mailFile.close( )
partCounter = 1
for part in msg.walk( ):
if part.get_main_type( ) == "multipart":
continue
name = part.get_param("name")
if name == None:
name = "part-%i" % partCounter
partCounter += 1
# In real life, make sure that name is a reasonable filename 
# for your OS; otherwise, mangle that name until it is!
f = open(name, "wb")
f.write(part.get_payload(decode=1))
f.close( )
print name
if _ _name_ _=="_ _main_ _":
main( )

Discussion

The email package makes parsing MIME messages reasonably easy. This recipe shows how to unbundle a MIME message with the email package by using the walk method of message objects.

You can create a message object in several ways. For example, you can instantiate the email.Message.Message class and build the message object's contents with calls to its methods. In this recipe, however, I need to read and analyze an existing message, so I work the other way around, calling the parse method of an email.Parser.Parser instance. The parse method takes as its only argument a file-like object (in the recipe, I pass it a real file object that I just opened for binary reading with the built-in open function) and returns a message object, on which you can call message object methods.

The walk method is a generator (i.e., it returns an iterator object on which you can loop with a for statement). You usually will use this method exactly as I use it in this recipe:

for part in msg.walk( ):

The iterator sequentially returns (depth-first, in case of nesting) the parts that make up the message. If the message is not a container of parts (i.e., has no attachments or alternatesmessage.is_multipart returns false), no problem: the walk method will then return an iterator with a single elementthe message itself. In any case, each element of the iterator is also a message object (an instance of email.Message.Message), so you can call on it any of the methods that a message object supplies.

In a multipart message, parts with a type of 'multipart/something' (i.e., a main type of 'multipart') may be present. In this recipe, I skip them explicitly since they're just glue holding the true parts together. I use the get_main_type method to obtain the main type and check it for equality with 'multipart'; if equality holds, I skip this part and move to the next one with a continue statement. When I know I have a real part in hand, I locate its name (or synthesize one if it has no name), open that name as a file, and write the message's contents (also known as the message's payload), which I get by calling the get_payload method, into the file. I use the decode=1 argument to ensure that the payload is decoded back to a binary content (e.g., an image, a sound file, a movie) if needed, rather than remaining in text form. If the payload is not encoded, decode=1 is innocuous, so I don't have to check before I pass it.

See Also

Recipe 13.6; documentation for the standard library package email in the Library Reference.