Introduction
Credit: Paul F. Dubois, Ph.D., Program for Climate Model
Diagnosis and Intercomparison, Lawrence Livermore National
LaboratoryThis chapter was originally meant to cover mainly topics such as
lexing, parsing, and code generationthe classic issues of
programs that are about programs. It turns out,
however, that Pythonistas did not post many recipes about such tasks,
focusing more on highly Python-specific topics such as program
introspection, dynamic importing, and generation of functions by
closure. Many of those recipes, we decided, were more properly
located in various other chapterson shortcuts, debugging,
object oriented programming, algorithms, metaprogramming, and
specific areas such as the handling of text, files, and persistence
Therefore, you will find those topics covered in other chapters. In
this chapter, we included only those recipes that are still best
described as programs about programs. Of these,
probably the most important one is that about
currying, the creation of new functions by
predetermining some arguments of other functions.This arrangement doesn''t mean that the classic
issues aren''t important! Python has extensive
facilities related to lexing and parsing, as well as a large number
of user-contributed modules related to parsing standard languages,
which reduces the need for doing your own programming. If Pythonistas
are not using these tools, then, in this one area, they are doing
more work than they need to. Lexing and parsing are among the most
common of programming tasks, and as a result, both are the subject of
much theory and much prior development. Therefore, in these areas
more than most, you will often profit if you take the time to search
for solutions before resorting to writing your own. This Introduction
contains a general guide to solving some common problems in these
categories to encourage reusing the wide base of excellent, solid
code and theory in these fields.
Lexing
Lexing
is the process of dividing an input stream into meaningful units,
known as tokens, which are then processed.
Lexing occurs in tasks such as data processing and in tools for
inspecting and modifying text.The regular expression facilities in Python are extensive and highly
evolved, so your first consideration for a lexing task is often to
determine whether it can be formulated using regular expressions.
Also, see the next section about parsers for common languages and how
to lex those languages.The Python Standard Library tokenize module splits
an input stream into Python-language tokens. Since
Python''s tokenization rules are similar to those of
many other languages, this module may often be suitable for other
tasks, perhaps with a modest amount of pre- and/or post-processing
around tokenize''s own operations.
For more complex tokenization tasks, Plex, http://nz.cosc.canterbury.ac.nz/~greg/python/Plex/,
can ease your efforts considerably.At the other end of the lexing complexity spectrum, the built-in
string method split can also be used for many
simple cases. For example, consider a file consisting of
colon-separated text fields, with one record per line. You can read a
line from the file as follows:
fields = line.split('':'')This produces a list of the fields. At this point, if you want toeliminate spurious whitespace at the beginning and ends of the
fields, you can remove it:
fields = [f.strip( ) for f in fields]For example:
>>> x = "abc :def:ghi : klm\n"Do not elaborate on this example: do not try to over-enrich simple
>>> fields = x.split('':'')
>>> print fields
[''abc '', ''def'', ''ghi '', '' klm\n'']
>>> print [f.strip( ) for f in fields]
[''abc'', ''def'', ''ghi'', ''klm'']
code to perform lexing and parsing tasks which are in fact quite hard
to perform with generality, solidity, and good performance, and for
which much excellent, reusable code exists. For parsing typical
comma-separated values files, or files using other delimiters, study
the standard Python library module csv. The
ScientificPython package, http://starship.python.net/~hinsen/ScientificPython/,
includes a module for reading and writing with Fortran-like formats,
and other such precious I/O modules, in the
Scientific.IO sub-package.A common "gotcha" for beginners is
that, while lexing and other text-parsing techniques can be used to
read numerical data from a file, at the end of this stage, the
entries are text strings, not numbers. The int and
float built-in functions are frequently needed
here, to turn each field from a string into a number:
>>> x = "1.2, 2.3, 4, 5.6"
>>> print [float(y.strip( )) for y in x.split('','')]
[1.2, 2.2999999999999998, 4.0, 5.5999999999999996]
Parsing
Parsing refers to
discovering semantic meaning from a series of tokens according to the
rules of a grammar. Parsing tasks are quite ubiquitous. Programming
tools may attempt to discover information about program texts or to
modify such texts to fit a task. (Python''s
introspection capabilities come into play here, as we will discuss
later.) Little languages is the generic name
given to application-specific languages that serve as human-readable
forms of computer input. Such languages can vary from simple lists of
commands and arguments to full-blown languages.The grammar in the previous lexing example was implicit: the data you
need is organized as one line per record with the fields separated by
a special character. The "parser"
in that case was supplied by the programmer reading the lines from
the file and applying the simple split method to
obtain the information. This sort of input file can easily grow,
leading to requests for a more elaborate form. For example, users may
wish to use comments, blank lines, conditional statements, or
alternate forms. While most such parsing can be handled with simple
logic, at some point, it becomes so complicated that it is much more
reliable to use a real grammar.There is no hard-and-fast way to decide which part of the job is a
lexing task and which belongs to the grammar. For example, comments
can often be discarded in the lexing, but doing so is not wise in a
program-transformation tool that must produce output containing the
original comments.Your strategy for parsing tasks can include:
- Using a parser for that language from the Python Standard Library.
- Using a parser from the user community. You can often find one by
 visiting the Vaults of Parnassus site,
 http://www.vex.net/parnassus/, or
 by searching the Python site, http://www.python.org.
- Generating a parser using a parser generator.
- Using Python itself as your input language.
A combination of approaches is often fruitful. For example, a simple
parser can turn input into Python-language statements, which Python
then executes in concert with a supporting package that you supply.A number of parsers for specific languages exist in the standard
library, and more are out there on the Web, supplied by the user
community. In particular, the standard library includes parsing
packages for XML, HTML, SGML, command-line arguments, configuration
files, and for Python itself. For the now-ubiquitous task of parsing
XML specifically, this cookbook includes a chapterChapter 14
, specifically dedicated to XML.You do not have to parse C to connect C routines to Python. Use SWIG
(http://www.swig.org). Likewise,
you do not need a Fortran parser to connect Fortran and Python. See
the Numerical Python web page at http://www.pfdubois.com/numpy/ for further
information. Again, this cookbook includes a chapter, Chapter 17
, which is dedicated to these kind of tasks.
PLY, SPARK, and Other Python Parser Generators
PLY
and SPARK
are two rich, solid, and mature Python-based parser generators. That
is, they take as their input some statements that describe the
grammar to be parsed and generate the parser for you. To make a
useful tool, you must add the semantic actions to be taken when a
certain construct in the grammar is recognized.PLY (http://systems.cs.uchicago.edu/ply) is a
Python implementation of the popular Unix tool
yacc. SPARK (http://pages.cpcc.ucalgary-ca/~aycoch/spart/contentl)
parses a more general set of grammars than yacc.
Both tools use Python introspection, including the idea of placing
grammar rules in functions'' docstrings.Parser generators are one of the many application areas that may have
even too many excellent tools, so that you may end up frustrated by
having to pick just one. Besides SPARK and PLY, other Python tools in
this field include TPG (Toy Parser Generator), DParser, PyParsing,
kwParsing (or kyParsing), PyLR, Yapps, PyGgy, mx.TextTools and its
SimpleParse frontendtoo many to provide more than a bare
mention of each, so, happy googling!The chief problem in using any of these tools is that you need to
educate yourself about grammars and learn to write them. A novice
without any computer science background will encounter some
difficulty except with very simple grammars. A lot of literature is
available to teach you how to use yacc, and most
of this knowledge will help you use SPARK and most of the others just
as well.If you are interested in this area, the penultimate reference is
Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman,
Compilers (Addison-Wesley), affectionately
known as "the Dragon Book" to
generations of computer science majors.[1]
[1] I''d even call this book the ultimate reference, were
it not for the fact that Donald Knuth continues to promise that the
fifth volume (current ETA, the year 2010) of his epoch-making
The Art of Computer Programming will be about
this very subject.
Using Python Itself as a Little Language
Python itself can be used to create
many application-specific languages. By writing suitable classes, you
can rapidly create a program that is easy to get running, yet is
extensible later. Suppose I want a language to describe graphs. Nodes
have names, and edges connect the nodes. I want a way to input such
graphs, so that after reading the input I will have the data
structures in Python that I need for any further processing. So, for
example:
nodes = {  }
def getnode(name):
" Return the node with the given name, creating it if necessary. "
if name in nodes:
node = nodes[name]
else:
node = nodes[name] = node(name)
return node
class node(object):
" A node has a name and a list of edges emanating from it. "
def _ _init_ _(self, name):
self.name = name
self.edgelist = [  ]
class edge(object):
" An edge connects two nodes. "
def _ _init_ _(self, name1, name2):
self.nodes = getnode(name1), getnode(name2)
for n in self.nodes:
n.edgelist.append(self)
def _ _repr_ _(self):
return self.nodes[0].name + self.nodes[1].nameUsing just these simple statements, I can now parse a list of edgesthat describe a graph, and afterwards, I will now have data
structures that contain all my information. Here, I enter a graph
with four edges and print the list of edges emanating from node
''A'':
>>> edge(''A'', ''B'')
>>> edge(''B'', ''C'')
>>> edge(''C'', ''D'')
>>> edge(''C'', ''A'')
>>> print getnode(''A'').edgelist
[AB, CA]Suppose that I now want a weighted graph. I could easily add aweight=1.0 default argument to the edge
constructor, and the old input would still work. Also, I could easily
add error-checking logic to ensure that edge lists have no
duplicates. Furthermore, I already have my node class and can start
adding logic to it for any needed processing purposes, be it directly
or by subclassing. I can easily turn the entries in the dictionary
nodes into similarly named variables that are bound to the node
objects. After adding a few more classes corresponding to other input
I need, I am well on my way.The advantage to this approach is clear. For example, the following
is already handled correctly:
edge(''A'', ''B'')
if ''X'' in nodes:
edge(''X'', ''A'')
def triangle(n1, n2, n3):
edge(n1, n2)
edge(n2, n3)
edge(n3, n1)
triangle(''A'',''W'',''K'')
execfile(''mygraph.txt'')     # Read graph from a datafileSo I already have syntactic sugar, user-defined language extensions,and input from other files. The definitions usually go into a module,
and the user simply import them. Had I written my own language,
instead of reusing Python in this little
language role, such accomplishments might be months
away.
Introspection
Python
programs have the ability to examine themselves; this set of
facilities comes under the general title of introspection. For
example, a Python function object knows a lot about itself, including
the names of its arguments, and the docstring that was given when it
was defined:
>>> def f(a, b):SPARK and PLY make an interesting use of introspection. The grammar
" Return the difference of a and b "
return a-b
...
>>> dir(f)
[''_ _call_ _'', ''_ _class_ _'', ''_ _delattr_ _'',
''_ _dict_ _'', ''_ _doc_ _'',
''_ _get_ _'', ''_ _getattribute_ _'', ''_ _hash_ _'', ''_ _init_ _'',
''_ _module_ _'',''_ _name_ _'',
''_ _new_ _'', ''_ _reduce_ _'', ''_ _reduce_ex_ _'',
''_ _repr_ _'',''_ _setattr_ _'', ''_ _str_ _'', ''func_closure'',
''func_code'', ''func_defaults'',
''func_dict'', ''func_doc'', ''func_globals'', ''func_name'']
>>> f.func_name
''f''
>>> f.func_doc
''Return the difference of a and b''
>>> f.func_code
<code object f at 0175DDF0, file "<pyshell#18>", line 1>
>>> dir (f.func_code)
[''_ _class_ _'', ''_ _cmp_ _'', ''_ _delattr_ _'', ''_ _doc_ _'',
''_ _getattribute_ _'', ''_ _hash_ _'', ''_ _init_ _'',
''_ _new_ _'', ''_ _reduce_ _'',
''_ _reduce_ex_ _'', ''_ _repr_ _'', ''_ _setattr_ _'',
''_ _str_ _'', ''co_argcount'',
''co_cellvars'', ''co_code'', ''co_consts'',
''co_filename'', ''co_firstlineno'',
''co_flags'', ''co_freevars'', ''co_lnotab'',
''co_name'', ''co_names'',
''co_nlocals'', ''co_stacksize'', ''co_varnames'']
>>> f.func_code.co_names
(''a'', ''b'')
is entered as docstrings in the routines that take the semantic
actions when those grammar constructs are recognized. (Hey,
don''t turn your head all the way around like that!
Introspection has its limits.)Introspection is very popular in the Python community, and you will
find many examples of it in recipes in this book, both in this
chapter and elsewhere. Even in this field, though,
always remember the possibility of reuse!
Standard library module inspect has a lot of
solid, reusable inspection-related code. It''s all
pure Python code, and you can (and should) study the
inspect.py source file in your Python library to
see what "raw" facilities underlie
inspect''s elegant high-level
functionsindeed, this suggestion generalizes: studying the
standard library''s sources is among the best things
you can do to increment your Python knowledge and skill. But
reusing the standard library''s
wealth of modules and packages is still best: any code you
don''t write is code you don''t have
to maintain, and solid, heavily tested code such as the code that you
find in the standard library is very likely to have far fewer bugs
than any newly developed code you might write yourself.Python is the most powerful language that you can still read. The
kinds of tasks discussed in this chapter help to show just how
versatile and powerful it really is.
 لطفا منتظر باشید ...
        لطفا منتظر باشید ...
     
                     
                
                