Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] - نسخه متنی

David Ascher, Alex Martelli, Anna Ravenscroft

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید







Recipe 2.6. Processing Every Word in a File


Credit: Luther Blissett


Problem



You need to do something with each and
every word in a file.


Solution


This task is best handled by two nested loops, one on lines and
another on the words in each line:

for line in open(thefilepath):
for word in line.split( ):
dosomethingwith(word)

The nested for statement's header
implicitly defines words as sequences of nonspaces separated by
sequences of spaces (just as the Unix program wc
does). For other definitions of words, you can use regular
expressions. For example:

import re
re_word = re.compile(r"[\w'-]+")
for line in open(thefilepath):
for word in re_word.finditer(line):
dosomethingwith(word.group(0))

In this case, a word is defined as a maximal sequence of
alphanumerics, hyphens, and apostrophes.


Discussion


If you want to use other definitions of words, you will obviously
need different regular expressions. The outer loop, on all lines in
the file, won't change.

It's often a good idea to wrap iterations as
iterator objects, and this kind of wrapping is most commonly and
conveniently obtained by coding simple generators:

def words_of_file(thefilepath, line_to_words=str.split):
the_file = open(thefilepath):
for line in the_file:
for word in line_to_words(line):
yield word
the_file.close( )
for word in words_of_file(thefilepath):
dosomethingwith(word)

This approach lets you separate, cleanly and effectively, two
different concerns: how to iterate over all items (in this case,
words in a file) and what to do with each item in the iteration. Once
you have cleanly encapsulated iteration concerns in an iterator
object (often, as here, a generator), most of your uses of iteration
become simple for statements. You can often reuse
the iterator in many spots in your program, and if maintenance is
ever needed, you can perform that maintenance in just one
placethe definition of the iteratorrather than having
to hunt for all uses. The advantages are thus very similar to those
you obtain in any programming language by appropriately defining and
using functions, rather than copying and pasting pieces of code all
over the place. With Python's iterators, you can get
these reuse advantages for all of your looping-control structures,
too.

We've taken the
opportunity afforded by the refactoring of the loop into a generator
to perform two minor enhancementsensuring the file is
explicitly closed, which is always a good idea, and generalizing the
way each line is split into words (defaulting to the
split method of string objects, but leaving a door
open to more generality). For example, when we need words as defined
by a regular expression, we can code another wrapper on top of
words_of_file thanks to this
"hook":

import re
def words_by_re(thefilepath, repattern=r"[\w'-]+"):
wre = re.compile(repattern)
def line_to_words(line):
for mo in wre.finditer(line):
return mo.group(0)
return words_of_file(thefilepath, line_to_words)

Here, too, we supply a reasonable default for the regular expression
pattern defining a word but still make it easy to pass a different
value in those cases in which different definitions are necessary.
Excessive generalization is a pernicious temptation, but a little
tasteful generalization suggested by experience will most often amply
repay the modest effort it requires. Having a function accept an
optional argument, while providing the most likely value for the
argument as the default value, is among the simplest and handiest
ways to implement this modest and often worthwhile kind of
generalization.


See Also


Chapter 19 for more on iterators and generators;
Library Reference and Python in a
Nutshell
on file objects and the
re module; Perl Cookbook
recipe 8.3.


    / 394