Perl Cd Bookshelf [Electronic resources]

نسخه متنی -صفحه : 875/ 498
نمايش فراداده

6.21. Program: urlify

This program puts HTML links around URLs in files. It doesn't work on all possible URLs, but does hit the most common ones. It tries to avoid including end-of-sentence punctuation in the marked-up URL.

It is a typical Perl filter, so it can be fed input from a pipe:

% gunzip -c ~/mail/archive.gz | urlify > archive.urlified

or by supplying files on the command line:

% urlify ~/mail/*.inbox > ~/allmail.urlified

The program is shown in Example 6-10.

Example 6-10. urlify

  #!/usr/bin/perl
# urlify - wrap HTML links around URL-like constructs
$protos = '(http|telnet|gopher|file|wais|ftp)';
$ltrs   = '\w';
$gunk   = ';/#~:.?+=&%@!\-';
$punc   = '.:?\-';
$any    = "${ltrs}${gunk}${punc}";
while (<>) {
s{
\b                    # start at word boundary
(                     # begin   {
$protos   :          # need resource and a colon
[$any] +?            # followed by on or more
#  of any valid character, but
#  be conservative and take only
#  what you need to....
)                     # end     }
(?=                   # look-ahead non-consumptive assertion
[$punc]*             # either 0 or more punctuation
[^$any]              #   followed by a non-url char
|                    # or else
$                    #   then end of the string
)
}{<A HREF="></A>}igox;
print;
}