Recipe 19.11. Reading Lines with Continuation Characters
Credit: Alex Martelli
Problem
You have a file that includes long logical lines split over two or
more physical lines, with backslashes to indicate that a continuation
line follows. You want to process a sequence of logical lines,
"rejoining" those split lines.
Solution
As usual, our first idea for a problem involving sequences should be
a generator:
def logical_lines(physical_lines, joiner=''.join):When run as a main script, this code emits:
logical_line = [ ]
for line in physical_lines:
stripped = line.rstrip( )
if stripped.endswith('\\'):
# a line which continues w/the next physical line
logical_line.append(stripped[:-1])
else:
# a line which does not continue, end of logical line
logical_line.append(line)
yield joiner(logical_line)
logical_line = [ ]
if logical_line:
# end of sequence implies end of last logical line
yield joiner(logical_line)
if _ _name_ _=='_ _main_ _':
text = 'some\\\n', 'lines\\\n', 'get\n', 'joined\\\n', 'up\n'
for line in text:
print 'P:', repr(line)
for line in logical_lines(text, ' '.join):
print 'L:', repr(line)
<c>P: 'some\\\n'
P: 'lines\\\n'
P: 'get\n'
P: 'joined\\\n'
P: 'up\n'
L: 'some lines get\n'
L: 'joined up\n'</c>
Discussion
This problem is about sequence-bunching, just like the previous
Recipe 19.10. It is
therefore not surprising that this recipe, like the previous, is a
generator (with an internal structure quite similar to the one in the
"other" recipe): today, in Python,
sequences are often processed most simply and effectively by means of
generators.In this recipe, the generator can encompass just a small amount of
generality without introducing extra complexity. Determining whether
a line is a continuation line, and of how to proceed when it is, is
slightly too idiosyncratic to generalize in a simple and transparent
way. I have therefore chosen to code that functionality inline, in
the body of the logical_lines generator, rather than
"factoring it out" into separate
callables. Remember, generality is good, but simplicity is even more
important. However, I have kept the simple and transparent generality
obtained by passing the joiner function as an
argument, and the snippet of code under the if _ _name_ _=
='_ _main_ _' test demonstrates how we may want to use that
generality, for example, to join continuation lines with a space
rather than with an empty string.If you are certain that the file you're processing
is sufficiently small to fit comfortably in your
computer's memory, with room to spare for
processing, and you don't need
the feature (offered in the version of logical_lines
shown in the "Solution") of
ignoring whitespace to the right of a terminating
\\, a solution using a plain function rather than
a generator is simpler than the one shown in this
recipe's Solution:
def logical_lines(physical_lines, joiner=''.join, separator=''):In this variant, we join all of the physical lines into one long
return joiner(physical_lines).replace('\\\n', separator).splitlines(True)
string, then we replace the
"canceled" line ends (line ends
immediately preceded by a backslash) with nothing (or any other
separator we're requested to use), and finally split
the resulting long string back into lines (keeping the line
endsthat's what the true
argument to method splitlines is for). This
approach is a very different one from that suggested in this recipe
but possibly worthwhile, if physical_lines is small
enough that you can afford the memory for it. I prefer the
"Solution"'s
approach because giving semantic significance to trailing whitespace
is a poor user interface design choice.
See Also
Recipe 19.10;
Perl Cookbook recipe 8.1; Chapter 1 for general issues about handling text;
Chapter 2 for general issues about handling
files.