Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] - نسخه متنی

David Ascher, Alex Martelli, Anna Ravenscroft

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید







Recipe 10.8. Selectively Copying a Mailbox File


Credit: Noah Spurrier, Dave Benjamin


Problem



You need to selectively copy
a large mailbox file (in mbox style), passing each
message through a filtering function that may alter or skip the
message.


Solution


The Python Standard Library package email is the
modern Python approach for this kind of task. However, standard
library modules mailbox and
rfc822 can also supply the base functionality to
implement this task:

def process_mailbox(mailboxname_in, mailboxname_out, filter_function):
mbin = mailbox.PortableUnixMailbox(file(mailboxname_in,'r'))
fout = file(mailboxname_out, 'w')
for msg in mbin:
if msg is None: break
document = filter_function(msg, msg.fp.read( ))
if document:
assert document.endswith('\n\n')
fout.write(msg.unixfrom)
fout.writelines(msg.headers)
fout.write('\n')
fout.write(document)
fout.close( )


Discussion


I often write lots of little scripts to filter my mailbox, so I wrote
this recipe's small module. I can import the module
from each script and call the module's function
process_mailbox as needed. Python's
future direction is to perform email processing with the standard
library package email, but lower-level modules,
such as mailbox and rfc822, are
still available in the Python Standard Library. They are sometimes
easier to use than the rich, powerful, and very general functionality
offered by package email.

The function you pass to process_mailbox as the
third argument, filter_function, must take two
argumentsmsg, an rfc822
message object, and document, a string that is the
message's entire body, ending with two line-end
characters (\n\n).
filter_function can return False,
meaning that this message must be skipped (i.e., not copied at all to
the output), or else it must return a string terminated with
\n\n that is written to the output as the message
body. Normally, filter_function returns either
False or the same document
argument it was called with, but in some cases you may find it useful
to write to the output file an altered version of the
message's body rather than the original message
body.

Here is an example of a filter function that removes duplicate
messages:

import sets
found_ids = sets.Set( )
def no_duplicates(msg, document):
msg_id = msg.getheader('Message-ID')
if msg_id in found_ids:
return False
found_ids.add(msg_id)
return document

In Python 2.4, you could use the built-in set
rather than sets.Set, but for a case as simple as
this, it makes no real difference in performance (and the usage is
exactly the same, anyway).


See Also


Documentation about modules mailbox and
rfc822, and package email, in
the Library Reference and Python in
a Nutshell
.


/ 394