6.21. Program: urlify
This program puts HTML links around URLs
in files. It doesn't work on all possible URLs, but does hit the most
common ones. It tries to avoid including end-of-sentence punctuation
in the marked-up URL.It is a typical Perl filter, so it can be fed input from a pipe:
% gunzip -c ~/mail/archive.gz | urlify > archive.urlified
or by supplying files on the command line:
% urlify ~/mail/*.inbox > ~/allmail.urlified
The program is shown in Example 6-10.
Example 6-10. urlify
#!/usr/bin/perl
# urlify - wrap HTML links around URL-like constructs
$protos = '(http|telnet|gopher|file|wais|ftp)';
$ltrs = '\w';
$gunk = ';/#~:.?+=&%@!\-';
$punc = '.:?\-';
$any = "${ltrs}${gunk}${punc}";
while (<>) {
s{
\b # start at word boundary
( # begin {
$protos : # need resource and a colon
[$any] +? # followed by on or more
# of any valid character, but
# be conservative and take only
# what you need to....
) # end }
(?= # look-ahead non-consumptive assertion
[$punc]* # either 0 or more punctuation
[^$any] # followed by a non-url char
| # or else
$ # then end of the string
)
}{<A HREF="></A>}igox;
print;
}