Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] - نسخه متنی

David Ascher, Alex Martelli, Anna Ravenscroft

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید







Recipe 1.15. Expanding and Compressing Tabs


Credit: Alex Martelli, David
Ascher


Problem


You want to convert tabs in a string to the appropriate number of
spaces, or vice versa.


Solution


Changing
tabs to the appropriate number of spaces is a reasonably frequent
task, easily accomplished with Python strings'
expandtabs method. Because strings are immutable,
the method returns a new string object, a modified copy of the
original one. However, it's easy to rebind a string
variable name from the original to the modified-copy value:

mystring = mystring.expandtabs( )

This doesn't change the string object to which
mystring originally referred, but it does
rebind the name mystring to a newly
created string object, a modified copy of
mystring in which tabs are expanded into
runs of spaces. expandtabs, by default, uses a tab
length of 8; you can pass expandtabs an integer
argument to use as the tab length.

Changing spaces into tabs is a rare and peculiar need. Compression,
if that's what you're after, is far
better performed in other ways, so Python doesn't
offer a built-in way to "unexpand"
spaces into tabs. We can, of course, write our own function for the
purpose. String processing tends to be fastest in a
split/process/rejoin approach, rather than with repeated overall
string transformations:

def unexpand(astring, tablen=8):
import re
# split into alternating space and non-space sequences
pieces = re.split(r'( +)', astring.expandtabs(tablen))
# keep track of the total length of the string so far
lensofar = 0
for i, piece in enumerate(pieces):
thislen = len(piece)
lensofar += thislen
if piece.isspace( ):
# change each space sequences into tabs+spaces
numblanks = lensofar % tablen
numtabs = (thislen-numblanks+tablen-1)/tablen
pieces[i] = '\t'*numtabs + ' '*numblanks
return ''.join(pieces)

Function unexpand, as written in this example, works
only for a single-line string; to deal with a multi-line string, use
''.join([ unexpand(s) for s in astring.splitlines(True)
])
.


Discussion


While regular
expressions are never indispensable for the purpose of manipulating
strings in Python, they are occasionally quite handy. Function
unexpand, as presented in the recipe, for example,
takes advantage of one extra feature of re.split
with respect to string's
split method: when the regular expression contains
a (parenthesized) group,
re.split returns a list where the split pieces are
interleaved with the "splitter"
pieces. So, here, we get alternate runs of nonblanks and blanks as
items of list pieces; the
for loop keeps track of the length of string it
has seen so far, and changes pieces that are made of blanks to as
many tabs as possible, plus as many blanks are needed to maintain the
overall length.

Some programming tasks that could still be described as
expanding tabs are unfortunately not quite as
easy as just calling the expandtabs method. A
category that does happen with some regularity is to fix Python
source files, which use a mix of tabs and spaces for indentation (a
very bad idea), so that they instead use spaces only (which is the
best approach). This could entail extra complications, for example,
when you need to guess the tab length (and want to end up with the
standard four spaces per indentation level, which is strongly
advisable). It can also happen when you need to preserve tabs that
are inside strings, rather than tabs being used for indentation
(because somebody erroneously used actual tabs, rather than
'\t', to indicate tabs in strings), or even
because you're asked to treat docstrings differently
from other strings. Some cases are not too badfor example,
when you want to expand tabs that occur only within runs of
whitespace at the start of each line, leaving any other tab alone. A
little function using a regular expression suffices:

def expand_at_linestart(P, tablen=8):
import re
def exp(mo):
return mo.group( ).expand(tablen)
return ''.join([ re.sub(r'^\s+', exp, s) for s in P.splitlines(True) ])

This function expand_at_linestart exploits the
re.sub function, which looks for a regular
expression in a string and, each time it gets a match, calls a
function, passing the match object as the argument, to obtain the
string to substitute in place of the match. For convenience,
expand_at_linestart is coded to deal with a
multiline string argument P, performing the list
comprehension over the results of the splitlines
call, and the '\n'.join of the whole. Of course,
this convenience does not stop the function from being able to deal
with a single-line P.

If your specifications regarding which tabs are to be expanded are
even more complex, such as needing to deal differently with tabs
depending on whether they're inside or outside of
strings, and on whether or not strings are docstrings, at the very
least, you need to perform a tokenization. In addition, you may also
have to perform a full parse of the source code
you're dealing with, rather than using simple string
or regular-expression operations. If this is the case, you can expect
a substantial amount of work. Some beginning pointers to help you get
started may be found in Chapter 16.

If you ever find yourself sweating out this kind of task, you will no
doubt get excellent motivation in the future for following the normal
and recommended Python style in the source code you write or edit:
only spaces, four per indentation level, no tabs, and always
'\t', never an actual tab character, to include a
tab in a string literal. Your favorite editor can no doubt be told to
enforce all of these conventions whenever a Python source file is
saved; the editor that comes with IDLE (the free integrated
development environment that comes with Python), for example,
supports these conventions. It is much easier to
arrange your editor so that the problem never arises, rather than
striving to fix it after the fact!


See Also


Documentation for the expandtabs method of strings
in the "Sequence Types" section of
the Library Reference; Perl
Cookbook
recipe 1.7; Library
Reference
and Python in a Nutshell
documentation of module re.


    / 394