Recipe 14.11. Generating OPML Files
Credit: Moshe Zadka, Premshree Pillai, Anna Martelli
Ravenscroft
Problem
OPML
(Outline Processor Markup
Language) is a standard file format for sharing subscription lists
used by RSS (Really Simple Syndication) feed readers and aggregators.
You want to share your subscription list, but your blogging site
provides only a FOAF (Friend-Of-A-Friend) page, not one in the
standard OPML format.
Solution
Use urllib2 to open and read the FOAF page and
xml.dom to parse the data received; then, output
the data in the proper OPML format to a file. For example,
LiveJournal is a popular blogging site that provides FOAF pages;
here's a module with the functions you need to turn
those pages into OPML files:
#!/usr/bin/python
import sys
import urllib2
import HTMLParser
from xml.dom import minidom, Node
def getElements(node, uri, name):
''' recursively yield all elements w/given namespace URI and name '''
if (node.nodeType==Node.ELEMENT_NODE and
node.namespaceURI==uri and
node.localName==name):
yield node
for node in node.childNodes:
for node in getElements(node, uri, name):
yield node
class LinkGetter(HTMLParser.HTMLParser):
''' HTML parser subclass which collecs attributes of link tags '''
def _ _init_ _(self):
HTMLParser.HTMLParser._ _init_ _(self)
self.links = [ ]
def handle_starttag(self, tag, attrs):
if tag == 'link':
self.links.append(attrs)
def getRSS(page):
''' given a `page' URL, returns the HREF to the RSS link '''
contents = urllib2.urlopen(page)
lg = LinkGetter( )
try:
lg.feed(contents.read(1000))
except HTMLParser.HTMLParserError:
pass
links = map(dict, lg.links)
for link in links:
if (link.get('rel')=='alternate' and
link.get('type')=='application/rss+xml'):
return link.get('href')
def getNicks(doc):
''' given an XML document's DOM, `doc', yields a triple of info for
each contact: nickname, blog URL, RSS URL '''
for element in getElements(doc, 'http://xmlns.com/foaf/0.1/', 'knows'):
person, = getElements(element, 'http://xmlns.com/foaf/0.1/', 'Person')
nick, = getElements(person, 'http://xmlns.com/foaf/0.1/', 'nick')
text, = nick.childNodes
nickText = text.toxml( )
blog, = getElements(person, 'http://xmlns.com/foaf/0.1/', 'weblog')
blogLocation = blog.getAttributeNS(
'http://www.w3.org/1999/02/22-rdf-syntax-ns#', 'resource')
rss = getRSS(blogLocation)
if rss:
yield nickText, blogLocation, rss
def nickToOPMLFragment((nick, blogLocation, rss)):
''' given a triple (nickname, blog URL, RSS URL), returns a string
that's the proper OPML outline tag representing that info '''
return '''
<outline text="%(nick)s"
htmlUrl="%(blogLocation)s"
type="rss"
xmlUrl="%(rss)s"/>
''' % dict(nick=nick, blogLocation=blogLocation, rss=rss)
def nicksToOPML(fout, nicks):
''' writes to file `fout' the OPML document representing the
iterable of contact information `nicks' '''
fout.write('''<?xml version="1.0" encoding="utf-8"?>
<opml version="1.0">
<head><title>Subscriptions</title></head>
<body><outline title="Subscriptions">
''')
for nick in nicks:
print nick
fout.write(nickToOPMLFragment(nick))
fout.write("</outline></body></opml>\n")
def docToOPML(fout, doc):
''' writes to file `fout' the OPLM for XML DOM `doc' '''
nicksToOPML(fout, getNicks(doc))
def convertFOAFToOPML(foaf, opml):
''' given URL `foaf' to a FOAF page, writes its OPML equivalent to
a file named by string `opml' '''
f = urllib2.urlopen(foaf)
doc = minidom.parse(f)
docToOPML(file(opml, 'w'), doc)
def getLJUser(user):
''' writes an OPLM file `user'.opml for livejournal's FOAF page '''
convertFOAFToOPML
('http://www.livejournal.com/users/%s/data/foaf' % user,
user+".opml")
if _ _name_ _ == '_ _main_ _':
# example, when this module is run as a main script
getLJUser('moshez')
Discussion
RSS feeds have become extremely popular for reading news, blogs,
wikis, and so on. OPML is one of the standard file formats used to
share subscription lists among RSS fans. This recipe generates an
OPML file that can be opened with any RSS reader. With an OPML file,
you can share your favorite subscriptions with anyone you like,
publish it to the Web, and so on.getElements is a convenience function that gets
written in almost every XML DOM-processing application. It
recursively scans the document, finding nodes that satisfy certain
criteria. This version of getElements is somewhat
quick and dirty, but it is good enough for our purposes.
getNicks is where the heart of the parsing brains
lie. It calls getElements to look for
"foaf:knows" nodes, and inside
those, it looks for the "foaf:nick"
element, which contains the LiveJournal nickname of the user, and
uses a generator to yield the nicknames in this FOAF document.Note an important idiom used four times in the body of
getNicks:
name, = some iterableThe key is the comma after name, which
turns the left-hand side of this assignment into a one-item
tuple, making the assignment into
what's technically known as an unpacking
assignment. Unpacking assignments are of course very
popular in Python (see Recipe 19.4 for a technique to make them
even more widely applicable) but normally with at least two names on
the left of the assignment, such as:
aname, another = iterable yielding 2 itemsThe idiom used in getNicks has exactly the same
function, but it demands that the iterable yield exactly
one item (otherwise, Python raises a
ValueError exception). Therefore, the idiom has
the same semantics as:
_templist = some iterableObviously, the name, = ... idiom is much cleaner
if len(_templist) != 1:
raise ValueError, 'too many values to unpack'
name = _templist[0]
del _templist
and more compact than this equivalent snippet, which is worth keeping
in mind for the next time you need to express the same semantics.nicksToOPML, together with its helper function
nickToOPMLFragment, generates the OPML, while
docToOPML ties together getNicks
and nicksToOPML into a FOAF->OPML convertor.
saveUser is the main function, which actually
interacts with the operating system (accessing the network to get the
FOAF, and using a file to save the OPML).The recipe has a specific function getLJUser(user)
to work with the LiveJournal (http://www.livejournal.com) friends lists.
However, the point is that the main
convertFOAFToOPML function is general enough to use
for other sites as well. The various helper functions can also come
in handy in your own different but related tasks. For example, the
getRSS function (with some aid from its helper class
LinkGetter) finds and returns a link to the RSS feed
(if one exists) for a given web site.
See Also
About OPML, http://feeds.scripting.com/whatIsOpml; for
more on RSS readers, http://blogspace.com/rss/readers; for FOAF
Vocabulary Specification, http://xmlns.com/foaf/0.1/.