
![]() | ![]() |
1.23. Program: fixstyle
Imagine you have a table with
both old and new strings, such as the following:
Old words | New words |
---|---|
bonnet | hood |
rubber | eraser |
lorry | truck |
trousers | pants |
all occurrences of each element in the first set to the corresponding
element in the second set.When called without filename arguments, the program is a simple
filter. If filenames are supplied on the command line, an in-place
edit writes the changes to the files, with the original versions
saved in a file with a ".orig" extension. See
Recipe 7.16 for a description. A -v command-line option writes notification of
each change to standard error.The table of original strings and their replacements is stored below
_ _END_ _ in the main program, as described in
Recipe 7.12. Each pair of strings is
converted into carefully escaped substitutions and accumulated into
the $code variable like the
popgrep2 program in Recipe 6.10.A -t check to test for an interactive run check
tells whether we're expecting to read from the keyboard if no
arguments are supplied. That way if users forget to give an argument,
they aren't wondering why the program appears to be hung.
Example 1-4. fixstyle
#!/usr/bin/perl -w
# fixstyle - switch first set of <DATA> strings to second set
# usage: $0 [-v] [files ...]
use strict;
my $verbose = (@ARGV && $ARGV[0] eq '-v' && shift);
if (@ARGV) {
$^I = ".orig"; # preserve old files
} else {
warn "$0: Reading from stdin\n" if -t STDIN;
}
my $code = "while (<>) {\n";
# read in config, build up code to eval
while (<DATA>) {
chomp;
my ($in, $out) = split /\s*=>\s*/;
next unless $in && $out;
$code .= "s{\\Q$in\\E}{$out}g";
$code .= "&& printf STDERR qq($in => $out at \$ARGV line \$.\\n)"
if $verbose;
$code .= ";\n";
}
$code .= "print;\n}\n";
eval "{ $code } 1" || die;
_ _END_ _
analysed => analyzed
built-in => builtin
chastized => chastised
commandline => command-line
de-allocate => deallocate
dropin => drop-in
hardcode => hard-code
meta-data => metadata
multicharacter => multi-character
multiway => multi-way
non-empty => nonempty
non-profit => nonprofit
non-trappable => nontrappable
pre-define => predefine
preextend => pre-extend
re-compiling => recompiling
reenter => re-enter
turnkey => turn-key
One caution: this program is fast, but it doesn't scale if you need
to make hundreds of changes. The larger the DATA
section, the longer it takes. A few dozen changes won't slow it down,
and in fact, the version given in Example 1-4 is
faster for that case. But if you run the program on hundreds of
changes, it will bog down.Example 1-5 is a version that's slower for few
changes but faster when there are many changes.
Example 1-5. fixstyle2
#!/usr/bin/perl -w
# fixstyle2 - like fixstyle but faster for many many changes
use strict;
my $verbose = (@ARGV && $ARGV[0] eq '-v' && shift);
my %change = ( );
while (<DATA>) {
chomp;
my ($in, $out) = split /\s*=>\s*/;
next unless $in && $out;
$change{$in} = $out;
}
if (@ARGV) {
$^I = ".orig";
} else {
warn "$0: Reading from stdin\n" if -t STDIN;
}
while (<>) {
my $i = 0;
s/^(\s+)// && print $1; # emit leading whitespace
for (split /(\s+)/, $_, -1) { # preserve trailing whitespace
print( ($i++ & 1) ? $_ : ($change{$_} || $_));
}
}
_ _END_ _
analysed => analyzed
built-in => builtin
chastized => chastised
commandline => command-line
de-allocate => deallocate
dropin => drop-in
hardcode => hard-code
meta-data => metadata
multicharacter => multi-character
multiway => multi-way
non-empty => nonempty
non-profit => nonprofit
non-trappable => nontrappable
pre-define => predefine
preextend => pre-extend
re-compiling => recompiling
reenter => re-enter
turnkey => turn-key
This version breaks each line into chunks of whitespace and words,
which isn't a fast operation. It then uses those words to look up
their replacements in a hash, which is much faster than a
substitution. So the first part is slower, the second faster. The
difference in speed depends on the number of matches.If you don't care about keeping the whitespace separating each word
constant, the second version can run as fast as the first, even for a
few changes. If you know a lot about your input, collapse whitespace
into single blanks by plugging in this loop:# very fast, but whitespace collapse
while (<>) {
for (split) {
print $change{$_} || $_, " ";
}
print "\n";
}
That leaves an extra blank at the end of each line. If that's a
problem, you could use the technique from Recipe 16.5 to install an output filter. Place the
following code in front of the while loop that's
collapsing whitespace:my $pid = open(STDOUT, "|-");
die "cannot fork: $!" unless defined $pid;
unless ($pid) { # child
while (<STDIN>) {
s/ $//;
print;
}
exit;
}
![]() | ![]() | ![]() |
1.22. Soundex Matching | ![]() | 1.24. Program: psgrep |

Copyright © 2003 O'Reilly & Associates. All rights reserved.