Perl Cd Bookshelf [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Perl Cd Bookshelf [Electronic resources] - نسخه متنی

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید



6.4. Commenting Regular Expressions


6.4.1. Problem



You want to make your complex
regular expressions understandable and maintainable.

6.4.2. Solution


You have several techniques at your disposal: electing alternate
delimiters to avoid so many backslashes, placing comments outside the
pattern or inside it using the /x modifier, and
building up patterns piecemeal in named variables.

6.4.3. Discussion


The piece of sample code in Example 6-1 uses the
first couple techniques, and its initial comment describes the
overall intent of the regular expression. For simple patterns, this
may be all that is needed. More complex patterns, as in the example,
require more documentation.

Example 6-1. resname


#!/usr/bin/perl -p
# resname - change all "foo.bar.com" style names in the input stream
# into "foo.bar.com [204.148.40.9]" (or whatever) instead
use Socket; # load inet_addr
s{
( # capture the hostname in $1
(?: # these parens for grouping only
(?! [-_] ) # lookahead for neither underscore nor dash
[\w-] + # hostname component
\. # and the domain dot
) + # now repeat that whole thing a bunch of times
[A-Za-z] # next must be a letter
[\w-] + # now trailing domain part
) # end of $1 capture
}{ # replace with this:
"$1 " . # the original bit, plus a space
( ($addr = gethostbyname($1)) # if we get an addr
? "[" . inet_ntoa($addr) . "]" # format it
: "[???]" # else mark dubious
)
}gex; # /g for global
# /e for execute
# /x for nice formatting

For aesthetics, the example uses alternate delimiters. When you split
your match or substitution over multiple lines, using matching braces
aids readability. A more common use of alternate delimiters is for
patterns and replacements that themselves contain slashes, such as in
s/\/\//\/..\//g. Alternate delimiters, as in
s!//!/../!g or s{//}{/../}g,
avoid escaping the non-delimiting slashes with backslashes, again
improving legibility.

The /x
pattern modifier makes Perl ignore whitespace in the pattern (outside
a character class) and treat # characters and
their following text as comments. The /e modifier
changes the replacement portion from a string into code to run. Since
it's code, you can put regular comments there, too.

To include literal whitespace or # characters in a
pattern to which you've applied /x, escape them
with a backslash:

s/ # replace
\# # a pound sign
(\w+) # the variable name
\# # another pound sign
/${$1}/xg; # with the value of the global variable

Remember that comments should explain what you're doing and why, not
merely restate the code. Using "$i++
# add one
to i" is apt to lose points in
your programming course or at least get you talked about in
substellar terms by your coworkers.

The last technique for rendering patterns more legible (and thus,
more maintainable) is to place each semantic unit into a variable
given an appropriate name. We use single quotes instead of doubles so
backslashes don't get lost.

$optional_sign = '[-+]?';
$mandatory_digits = '\d+';
$decimal_point = '\.?';
$optional_digits = '\d*';
$number = $optional_sign
. $mandatory_digits
. $decimal_point
. $optional_digits;

Then use $number in further patterns:

if (/($number)/) { # parse out one
$found = $1;
}
@allnums = /$number/g; # parse all out
unless (/^$number$/) { # any extra?
print "need a number, just a number\n";
}

We can even combine all of these techniques:

# check for line of whitespace-separated numbers
m{
^ \s * # optional leading whitespace
$number # at least one number
(?: # begin optional cluster
\s + # must have some separator
$number # more the next one
) * # repeat at will
\s * $ # optional trailing whitespace
}x

which is certainly a lot better than writing:

/^\s*[-+]?\d+\.?\d*(?:\s+[-+]?\d+\.?\d*)*\s*/

Patterns that you put in variables should probably not contain
capturing parentheses or backreferences, since a capture in one
variable could change the numbering of those in others.

Clustering parentheses—that is, /(?:...)/
instead of /(...)/—though, are fine. Not
only are they fine, they're necessary if you want to apply a
quantifier to the whole variable. For example:

$number = "(?:"
. $optional_sign
. $mandatory_digits
. $decimal_point
. $optional_digits
. ")";

Now you can say /$number+/ and have the plus apply
to the whole number group. Without the grouping, the plus would have
shown up right after the last star, which would have been illegal.

One more trick with clustering parentheses is that you can embed a
modifier switch that applies only to that cluster. For example:

$hex_digit = '(?i:[0-9a-z])';
$hdr_line = '(?m:[^:]*:.*)';

The
qr// construct does this automatically using
cluster parentheses, enabling any modifiers you specified and
disabling any you didn't for that cluster:

$hex_digit = qr/[0-9a-z]/i;
$hdr_line = qr/^[^:]*:.*/m;
print "hex digit is: $hex_digit\n";
print "hdr line is: $hdr_line\n";
hex digit is: (?i-xsm:[0-9a-z])
hdr line is: (?m-xis:^[^:]*:.*)

It's probably a good idea to use qr// in the first
place:

$optional_sign = qr/[-+]?/;
$mandatory_digits = qr/\d+/;
$decimal_point = qr/\.?/;
$optional_digits = qr/\d*/;
$number = qr{
$optional_sign
$mandatory_digits
$decimal_point
$optional_digits
}x;

Although the output can be a bit odd to read:

print "Number is $number\n";
Number is (?x-ism:
(?-xism:[-+]?)
(?-xism:\d+)
(?-xism:\.?)
(?-xism:\d*)
)

6.4.4. See Also


The /x modifier in perlre(1)
and Chapter 5 of Programming Perl; the
"Comments Within a Regular Expression" section of Chapter 7 of
Mastering Regular Expressions



6.3. Matching Words6.5. Finding the Nth Occurrence of a Match




Copyright © 2003 O'Reilly & Associates. All rights reserved.

/ 875