Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] - نسخه متنی

David Ascher, Alex Martelli, Anna Ravenscroft

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید







Recipe 1.6. Combining Strings


Credit: Luther Blissett


Problem


You have several small strings that you need to combine into one
larger string.


Solution


To join a sequence of small strings into one large string, use the
string operator join. Say that
pieces is a list whose items are strings, and you
want one big string with all the items concatenated in order; then,
you should code:

largeString = ''.join(pieces)

To put together pieces stored in a few variables, the
string-formatting operator % can often be even
handier:

largeString = '%s%s something %s yet more' % (small1, small2, small3)


Discussion


In Python, the
+ operator concatenates strings and therefore
offers seemingly obvious solutions for putting small strings together
into a larger one. For example, when you have pieces stored in a few
variables, it seems quite natural to code something like:

largeString = small1 + small2 + ' something ' + small3 + ' yet more'

And similarly, when you have a sequence of small strings named
pieces, it seems quite natural to code
something like:

largeString = ''
for piece in pieces:
largeString += piece

Or, equivalently, but more fancifully and compactly:

import operator
largeString = reduce(operator.add, pieces, '')

However, it's very important to realize that none of
these seemingly obvious solution is goodthe approaches shown
in the "Solution" are
vastly superior.

In Python, string objects are immutable. Therefore, any operation on
a string, including string concatenation, produces a new string
object, rather than modifying an existing one. Concatenating
N strings thus involves building and then
immediately throwing away each of N-1
intermediate results. Performance is therefore vastly better for
operations that build no intermediate results, but rather produce the
desired end result at once.

Python's string-formatting operator
% is one such operation, particularly suitable
when you have a few pieces (e.g., each bound to a different variable)
that you want to put together, perhaps with some constant text in
addition. Performance is not a major issue for this specific kind of
task. However, the % operator also has other
potential advantages, when compared to an expression that uses
multiple + operations on strings. % is more readable, once you get
used to it. Also, you don't have to call
str on pieces that aren't already
strings (e.g., numbers), because the format specifier
%s does so implicitly. Another advantage is that
you can use format specifiers other than %s, so
that, for example, you can control how many significant digits the
string form of a floating-point number should display.


What Is "a Sequence?"


Python does not
have a specific type called sequence, but
sequence is still an often-used term in
Python. sequence, strictly speaking, means: a
container that can be iterated on, to get a finite number of items,
one at a time, and that also supports indexing,
slicing, and being passed to the built-in function
len (which gives the number of items in a
container). Python lists are the
"sequences" you'll
meet most often, but there are many others (strings, unicode objects,
tuples, array.arrays, etc.).

Often, one does not need indexing, slicing, and
lenthe ability to iterate, one item at a
time, suffices. In that case, one should speak of an
iterable (or, to focus on the finite number of
items issue, a bounded iterable). Iterables that
are not sequences include dictionaries (iteration gives the
keys of the dictionary, one at a time in
arbitrary order), file objects (iteration gives the
lines of the text file, one at a time), and many
more, including iterators and generators. Any iterable can be used in
a for loop statement and in many equivalent
contexts (the for clause of a list comprehension
or Python 2.4 generator expression, and also many built-ins such as
min, max,
zip, sum,
str.join, etc.).

At http://www.python.org/moin/PythonGlossary,
you can find a Python Glossary that can help
you with these and several other terms. However, while the editors of
this cookbook have tried to adhere to the word usage that the
glossary describes, you will still find many places where this book
says a sequence or an
iterable
or even a list, where, by
strict terminology, one should always say a bounded
iterable
. For example, at the start of this
recipe's Solution, we say "a
sequence of small strings" where, in fact, any
bounded iterable of strings suffices. The problem with using
"bounded iterable" all over the
place is that it would make this book read more like a mathematics
textbook than a practical programming book! So, we have deviated from
terminological rigor where readability, and maintaining in the book a
variety of "voices", were better
served by slightly imprecise terminology that is nevertheless
entirely clear in context.

When you have many small string pieces in
a sequence, performance can become a truly important issue. The time
needed to execute a loop using + or
+= (or a fancier but equivalent approach using the
built-in function reduce) grows with the square of
the number of characters you are accumulating, since the time to
allocate and fill a large string is roughly proportional to the
length of that string. Fortunately, Python offers an excellent
alternative. The join method of a string object
s takes as its only argument a sequence of
strings and produces a string result obtained by concatenating all
items in the sequence, with a copy of s
joining each item to its neighbors. For example,
''.join(pieces) concatenates all the items of
pieces in a single gulp, without
interposing anything between them, and ',
'.join(pieces)
concatenates the items putting a comma and a
space between each pair of them. It's the fastest,
neatest, and most elegant and readable way to put a large string
together.

When the
pieces are not all available at the same time, but rather come in
sequentially from input or computation, use a list as an intermediate
data structure to hold the pieces (to add items at the end of a list,
you can call the append or
extend methods of the list). At the end, when the
list of pieces is complete, call ''.join(thelist)
to obtain the big string that's the concatenation of
all pieces. Of all the many handy tips and tricks I could give you
about Python strings, I consider this one by far
the most significant: the most frequent reason some Python programs
are too slow is that they build up big strings with
+ or +=. So, train yourself
never to do that. Use, instead, the ''.join
approach recommented in this recipe.

Python 2.4 makes a heroic attempt to ameliorate the issue, reducing a
little the performance penalty due to such erroneous use of
+=. While ''.join is still way
faster and in all ways preferable, at least some newbie or careless
programmer gets to waste somewhat fewer machine cycles. Similarly,
psyco (a specializing just-in-time [JIT] Python compiler found at
http://psyco.sourceforge.net/),
can reduce the += penalty even further.
Nevertheless, ''.join remains the best approach in
all cases.


See Also


The Library Reference and Python in
a Nutshell
sections on string methods, string-formatting
operations, and the operator module.


/ 394