Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Python Cookbook 2Nd Edition Jun 1002005 [Electronic resources] - نسخه متنی

David Ascher, Alex Martelli, Anna Ravenscroft

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید







Recipe 19.12. Iterating on a Stream of Data Blocks as a Stream of Lines


Credit: Scott David Daniels, Peter Cogolo


Problem


You want to loop over all lines of a stream, but the stream arrives
as a sequence of data blocks of arbitrary size (e.g., from a network
socket).


Solution


We need to code a generator that gets blocks and yields lines:

def ilines(source_iterable, eol='\r\n', out_eol='\n'):
tail = ''
for block in source_iterable:
pieces = (tail+block).split(eol)
tail = pieces.pop( )
for line in pieces:
yield line + out_eol
if tail:
yield tail
if _ _name_ _ == '_ _main_ _':
s = 'one\r\ntwo\r,\nthree,four,five\r\n,six,\r\nseven\r\nlast'.split(',')
for line in ilines(s): print repr(line)

When run as a main script, this code emits:

'one\n'
'two\n'
'threefourfive\n'
'six\n'
'seven\n'
'last'


Discussion


Many data sources produce their data in fits and
startssockets, RSS feeds, the results of expanding compressed
text, and (at its heart) most I/O. The data often
doesn't arrive at convenient boundaries, but you
nevertheless want to consume it in logical units. For text, the
logical units are often lines.

This recipe shows generator ilines, a simple way to
consume a source_iterable, which yields blocks of
data, producing an iterator that yields lines of text instead.
ilines is vastly simplified by assuming that lines
are separated, on input, by a known end-of-line (EOL) stringby
default
'\r\n',
which is the standard EOL marker in most Internet protocols.
ilines' implementation is further
simplified by taking a high-level approach, relying on the
split method of Python's string
types to do most of the work. This basically leaves
ilines with the single task of
"buffering" data between successive
input blocks, on all occasions when a line starts in one block and
ends in a following one (including those occasions in which block
boundaries "split" an EOL marker).

ilines easily accomplishes its buffering task
through its local variable tail, which
starts empty and, at each leg of the loop, holds that which followed
the latest EOL marker seen so far. When tail+block
ends with an EOL marker, the expression
(tail+block).split(eol) produces a list whose last
item is an empty string (''), exactly what we
need; otherwise, the last item of the list is that which followed the
last EOL, which again is exactly what we need.

Python's built-in file objects
are even more powerful than ilines, since they
support a universal newlines reading mode
(mode 'U'), which is able to recognize and deal
with all common EOL markers (even when different markers are mixed
within the same stream!). However, ilines is more
flexible, since you may apply it in many situations where you have a
stream of arbitrary blocks of text and want to process it as a stream
of lines, with a known EOL marker.


See Also


Library Reference and Python in a
Nutshell
docs about built-in file
objects; Chapter 2 for general issues about
handling files.


/ 394