Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Recipe 1.18. Replacing Multiple Patterns in a Single Pass

Credit: Xavier Defrang, Alex Martelli

Problem

You need to perform several string substitutions on a
string.

Solution

Sometimes regular expressions
afford the fastest solution even in cases where their applicability
is not obvious. The powerful sub method of
re objects (from the re module
in the standard library) makes regular expressions particularly good
at performing string substitutions. Here is a function returning a
modified copy of an input string, where each occurrence of any string
that's a key in a given dictionary is replaced by
the corresponding value in the dictionary:

import re
def multiple_replace(text, adict):
rx = re.compile('|'.join(map(re.escape, adict)))
def one_xlat(match):
return adict[match.group(0)]
return rx.sub(one_xlat, text)

Discussion

This recipe shows how to
use the Python standard re module to perform
single-pass multiple-string substitution using a dictionary.
Let's say you have a dictionary-based mapping
between strings. The keys are the set of strings you want to replace,
and the corresponding values are the strings with which to replace
them. You could perform the substitution by calling the string method
replace for each key/value pair in the dictionary,
thus processing and creating a new copy of the entire text several
times, but it is clearly better and faster to do all the changes in a
single pass, processing and creating a copy of the text only once.
re.sub's callback facility makes
this better approach quite easy.

First, we have to build a regular expression from the set of keys we
want to match. Such a regular expression has a pattern of the form
a1|a2|...|aN, made up of the
N strings to be substituted, joined by
vertical bars, and it can easily be generated using a one-liner, as
shown in the recipe. Then, instead of giving
re.sub a replacement string, we pass it a callback
argument. re.sub then calls this object for each
match, with a re.MatchObject instance as its only
argument, and it expects the replacement string for that match as the
call's result. In our case, the callback just has to
look up the matched text in the dictionary and return the
corresponding value.

The function
multiple_replace presented in the recipe recomputes
the regular expression and redefines the one_xlat
auxiliary function each time you call it. Often, you must perform
substitutions on multiple strings based on the same, unchanging
translation dictionary and would prefer to pay these setup prices
only once. For such needs, you may prefer the following closure-based
approach:

import re
def make_xlat(*args, **kwds):
adict = dict(*args, **kwds)
rx = re.compile('|'.join(map(re.escape, adict)))
def one_xlat(match):
return adict[match.group(0)]
def xlat(text):
return rx.sub(one_xlat, text)
return xlat

You can
call make_xlat, passing as its argument a
dictionary, or any other combination of arguments you could pass to
built-in dict in order to construct a dictionary;
make_xlat returns a xlat
closure that takes as its only argument text the
string on which the substitutions are desired and returns a copy of
text with all the substitutions performed.

Here's a usage example for each half of this recipe.
We would normally have such an example as a part of the same
.py source file as the functions in the recipe,
so it is guarded by the traditional Python idiom that runs it if and
only if the module is called as a main script:

if _ _name_ _ == "_ _main_ _":
text = "Larry Wall is the creator of Perl"
adict = {
"Larry Wall" : "Guido van Rossum",
"creator" : "Benevolent Dictator for Life",
"Perl" : "Python",
}
print multiple_replace(text, adict)
translate = make_xlat(adict)
print translate(text)

Substitutions such as those performed by this recipe are often
intended to operate on entire words, rather than on arbitrary
substrings. Regular expressions are good at picking up the beginnings
and endings of words, thanks to the special sequence
r'\b'. We can easily make customized versions of
either multiple_replace or
make_xlat by simply changing the one line in which
each of them builds and assigns the regular expression object
rx into a slightly different form:

  rx = re.compile(r'\b%s\b' % r'\b|\b'.join(map(re.escape, adict)))

The rest of the code is just the same as shown earlier in this
recipe. However, this sameness is not necessarily good news: it
suggests that if we need many similarly customized versions, each
building the regular expression in slightly different ways,
we'll end up doing a lot of copy-and-paste coding,
which is the worst form of code reuse, likely to lead to high
maintenance costs in the future.

A key rule of good coding is: "once, and only
once!" When we notice that we are duplicating code,
we should notice this symptom as a "code
smell," and refactor our code for better reuse. In
this case, for ease of customization, we need a class rather than a
function or closure. For example, here's how to
write a class that works very similarly to make_xlat
but can be customized by subclassing and overriding:

class make_xlat:
def _ _init_ _(self, *args, **kwds):
self.adict = dict(*args, **kwds)
self.rx = self.make_rx( )
def make_rx(self):
return re.compile('|'.join(map(re.escape, self.adict)))
def one_xlat(self, match):
return self.adict[match.group(0)]
def _ _call_ _(self, text):
return self.rx.sub(self.one_xlat, text)

This is a "drop-in replacement" for
the function of the same name: in other words, a snippet such as the
one we showed, with the if _ _name_ _ == '_ _main_ _' guard, works identically when make_xlat
is this class rather than the previously shown function. The function
is simpler and faster, but the class' important
advantage is that it can easily be customized in the usual
object-oriented waysubclassing it, and overriding some method.
To translate by whole words, for example, all we need to code is:

class make_xlat_by_whole_words(make_xlat):
def make_rx(self):
return re.compile(r'\b%s\b' % r'\b|\b'.join(map(re.escape, self.adict)))

Ease of customization by subclassing and overriding helps you avoid
copy-and-paste coding, and this is sometimes an excellent reason to
prefer object-oriented structures over simpler functional structures,
such as closures. Of course, just because some functionality is
packaged as a class doesn't magically make it
customizable in just the way you want. Customizability also requires
some foresight in dividing the functionality into separately
overridable methods that correspond to the right pieces of overall
functionality. Fortunately, you don't have to get it
right the first time; when code does not have the optimal internal
structure for the task at hand (in this specific example, for reuse
by subclassing and selective overriding), you can and should refactor
the code so that its internal structure serves your needs. Just make
sure you have a suitable battery of tests ready to run to ensure that
your refactoring hasn't broken anything, and then
you can refactor to your heart's content. See
http://www.refactoring.com for
more information on the important art and practice of refactoring.

Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی