Recipe 10.9. Building a Whitelist of Email Addresses From a Mailbox
Credit: Noah Spurrier
Problem
To help you configure
an antispam system, you want a list of email addresses, commonly
known as a whitelist, that you can trust
won't send you spam. The addresses to which you send
email are undoubtedly good candidates for this
whitelist.
Solution
Here is a script to output "To"
addresses given a mailbox path:
#!/usr/bin/env python
"" Extract and print all 'To:' addresses from a mailbox ""
import mailbox
def main(mailbox_path):
addresses = { }
mb = mailbox.PortableUnixMailbox(file(mailbox_path))
for msg in mb:
toaddr = msg.getaddr('To')[1]
addresses[toaddr] = 1
addresses = addresses.keys( )
addresses.sort( )
for address in addresses:
print address
if _ _name_ _ == '_ _main_ _':
import sys
main(sys.argv[1])
Discussion
In addition to bypassing spam filters, identifying addresses of
people you've sent mail to may also help in other
ways, such as flagging emails from them as higher priority, depending
on your mail-reading habits and your mail reader's
capabilities. As long as your mail reader keeps mail you have sent in
some kind of "Sent Items" mailbox
in standard mailbox format, you can call this script with the path to
the mailbox as its only argument, and the addresses to which
you've sent mail will be emitted to standard output.The script is simple because the Python Standard Library module
mailbox does all the hard work. All the script
needs to do is collect the set of email addresses as it loops through
all messages, then emit them. While collecting, we keep
addresses as a dictionary, since
that's much faster than keeping a list and checking
each toaddr in order to append it only if it
wasn't already in the list. When
we're done collecting, we just extract the addresses
from the dictionary as a list because we want to emit its items in
sorted order. In Python 2.4, function main can be
made even slightly more elegant, thanks to the new built-ins
set and sorted:
def main(mailbox_path):If your mailbox is not in the Unix mailbox style supported by
addresses = set( )
mb = mailbox.PortableUnixMailbox(file(mailbox_path))
for msg in mb:
toaddr = msg.getaddr('To')[1]
addresses.add(toaddr)
for address in sorted(addresses):
print address
mailbox.PortableUnixMailbox, you may want to use
other classes supplied by the Python Standard Library module
mailbox. For example, if your mailbox is in Qmail
maildir format, you can use the
mailbox.Maildir class to read it.
See Also
Documentation of the standard library module
mailbox in the Library
Reference and Python in a
Nutshell.