Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] - نسخه متنی

David Ascher, Alex Martelli, Anna Ravenscroft

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید


Recipe 2.1. Reading from a File


Credit: Luther Blissett


Problem


You want to read text or data from a file.


Solution



Here's the most
convenient way to read all of the file's contents at
once into one long string:

all_the_text = open('thefile.txt').read( ) 
# all text from a text file
all_the_data = open('abinfile', 'rb').read( )
# all data from a binary file

However, it is safer to bind the file object to a name, so that you
can call close on it as soon as
you're done, to avoid ending up with open files
hanging around. For example, for a text file:

file_object = open('thefile.txt')
try:
all_the_text = file_object.read( )
finally:
file_object.close( )

You don't necessarily have to use the
TRy/finally statement here, but
it's a good idea to use it, because it ensures the
file gets closed even when an error occurs during reading.

The simplest, fastest, and most Pythonic way to read a text
file's contents at once as a list of strings, one
per line, is:

list_of_all_the_lines = file_object.readlines( )

This leaves a '\n' at the end of each line; if you
don't want that, you have alternatives, such as:

list_of_all_the_lines = file_object.read( ).splitlines( )
list_of_all_the_lines = file_object.read( ).split('\n')
list_of_all_the_lines = [L.rstrip('\n') for L in file_object]

The simplest and fastest way to process a text file one line at a
time is simply to loop on the file object with a
for statement:

for line in file_object:
process line

This approach also leaves a '\n' at the end of
each line; you may remove it by starting the for
loop's body with:

    line = line.rstrip('\n')

or even, when you're OK with getting rid of trailing
whitespace from each line (not just a trailing
'\n'), the generally handier:

    line = line.rstrip( )


Discussion



Unless the file you're
reading is truly huge, slurping it all into memory in one gulp is
often fastest and most convenient for any further processing. The
built-in function open creates a Python file
object (alternatively, you can equivalently call the built-in type
file). You call the read method
on that object to get all of the contents (whether text or binary) as
a single long string. If the contents are text, you may choose to
immediately split that string into a list of lines with the
split method or the specialized
splitlines method. Since splitting into lines is
frequently needed, you may also call readlines
directly on the file object for faster, more convenient
operation.

You can also loop directly on the file object, or pass it to
callables that require an iterable, such as list
or maxwhen thus treated as an iterable, a
file object open for reading has the file's text
lines as the iteration items (therefore, this should be done for text
files only). This kind of line-by-line iteration is cheap in terms of
memory consumption and fairly speedy too.

On Unix and Unix-like systems, such as Linux, Mac OS X, and other BSD
variants, there is no real distinction between text files and binary
data files. On Windows and very old Macintosh systems, however, line
terminators in text files are encoded, not with the standard
'\n' separator, but with '\r\n'
and '\r', respectively. Python translates these
line-termination characters into '\n' on your
behalf. This means that you need to tell Python when you open a
binary file, so that it won't perform such
translation. To do so, use 'rb' as the second
argument to open. This is innocuous even on
Unix-like platforms, and it's a good habit to
distinguish binary files from text files even there, although
it's not mandatory in that case. Such good habits
will make your programs more immediately understandable, as well as
more compatible with different
platforms.

If you're unsure about which line-termination
convention a certain text file might be using, use
'rU' as the second argument to
open, requesting universal endline translation.
This lets you freely interchange text files among Windows, Unix
(including Mac OS X), and old Macintosh systems, without worries: all
kinds of line-ending conventions get mapped to
'\n', whatever platform your code is running on.

You can call methods such as read directly on the
file object produced by the open function, as
shown in the first snippet of the solution. When you do so, you no
longer have a reference to the file object as soon as the reading
operation finishes. In practice, Python notices the lack of a
reference at once, and immediately closes the file. However, it is
better to bind a name to the result of open, so
that you can call close yourself explicitly when
you are done with the file. This ensures that the file stays open for
as short a time as possible, even on platforms such as Jython,
IronPython, and other hypothetical future versions of Python, on
which more advanced garbage-collection mechanisms might delay the
automatic closing that the current version of C-based Python performs
at once. To ensure that a file object is closed even if errors happen
during its processing, the most solid and prudent approach is to use
the try/finally statement:

file_object = open('thefile.txt')
try:
for line in file_object:
process line
finally:
file_object.close( )

Be careful not to place the call to
open inside the
try clause of this
try/finally statement (a rather
common error among beginners). If an error occurs during the opening,
there is nothing to close, and besides, nothing gets bound to name
file_object, so you definitely
don't want to call file_object.close()!

If you choose to read the file a little at a time, rather than all at
once, the idioms are different. Here's one way to
read a binary file 100 bytes at a time, until you reach the end of
the file:

file_object = open('abinfile', 'rb')
try:
while True:
chunk = file_object.read(100)
if not chunk:
break
do_something_with(chunk)
finally:
file_object.close( )

Passing an argument N to the
read method ensures that read
will read only the next N bytes (or fewer,
if the file is closer to the end). read returns
the empty string when it reaches the end of the file. Complicated
loops are best encapsulated as reusable generators. In this case, we
can encapsulate the logic only partially, because a
generator's yield keyword is not
allowed in the try clause of a
try/finally statement. Giving
up on the assurance of file closing afforded by
try/finally, we can therefore
settle for:

def read_file_by_chunks(filename, chunksize=100):
file_object = open(filename, 'rb')
while True:
chunk = file_object.read(chunksize)
if not chunk:
break
yield chunk
file_object.close( )

Once this read_file_by_chunks generator is
available, your application code to read and process a binary file by
fixed-size chunks becomes extremely simple:

for chunk in read_file_by_chunks('abinfile'):
do_something_with(chunk)

Reading a text file one line at a time is a frequent task. Just loop
on the file object, as in:

for line in open('thefile.txt', 'rU'):
do_something_with(line)

Here, too, in order to be 100% certain that no uselessly open file
object will ever be left just hanging around, you may want to code
this snippet in a more rigorously correct and prudent way:

file_object = open('thefile.txt', 'rU'):
try:
for line in file_object:
do_something_with(line)
finally:
file_object.close( )


See Also


Recipe 2.2; documentation
for the open built-in function and
file objects in the Library
Reference
and Python in a
Nutshell
.

/ 394