
![]() | ![]() |
6.7. Reading Records with a Separator
6.7.1. Problem
You want to read records separated by a
pattern, but Perl doesn't allow its input record separator variable
to be a regular
expression.
Many problems, most obviously those involving parsing complex file
formats, become simpler when you can extract records separated by
different
strings.
6.7.2. Solution
Read the whole file and use split:
undef $/;
@chunks = split(/pattern/, <FILEHANDLE>);
6.7.3. Discussion
Perl's official record separator, the $/ variable,
must be a fixed string, not a pattern. To sidestep this limitation,
undefine the input record separator entirely so that the next
readline operation reads the rest of the file.
This is sometimes called slurp mode, because
it slurps in the whole file as one big string. Then
split that huge string using the record separating
pattern as the first argument.Here's an example where the input stream is a text file that includes
lines consisting of ".Se",
".Ch", and ".Ss", which are
special codes in the troff macro set that this
book was developed under. These strings are the separators, and we
want to find text that falls between them.# .Ch, .Se and .Ss divide chunks of STDIN
{
local $/ = undef;
@chunks = split(/^\.(Ch|Se|Ss)$/m, <>);
}
print "I read ", scalar(@chunks), " chunks.\n";
We create a localized version of $/ so its
previous value is restored once the block finishes. By using
split with parentheses in the pattern, captured
separators are also returned. This way data elements in the return
list alternate with elements containing "Se",
"Ch", or "Ss".If you don't want separators returned, but still need parentheses,
use non-capturing parentheses in the pattern:
/^\.(?:Ch|Se|Ss)$/m.To split before a pattern but include the
pattern in the return, use a lookahead assertion:
/^(?=\.(?:Ch|Se|Ss))/m. That way each chunk except
the first starts with the pattern.Be aware that this uses a lot of memory when the file is large.
However, with today's machines and typical text files, this is less
often an issue now than it once was. Just don't try it on a 200 MB
logfile unless you have plenty of virtual memory for swapping out to
disk! Even if you do have enough swap space, you'll likely end up
thrashing.
6.7.4. See Also
The $/ variable in perlvar(1)
and in the "Per-Filehandle Variables" section of Chapter 28 of
Programming Perl; the split
function in perlfunc(1) and Chapter 29 of
Programming Perl; we talk more about the
special variable $/ in Chapter 8.
![]() | ![]() | ![]() |
6.6. Matching Within Multiple Lines | ![]() | 6.8. Extracting a Range of Lines |

Copyright © 2003 O'Reilly & Associates. All rights reserved.