
![]() | ![]() |
20.4. Converting ASCII to HTML
20.4.1. Problem
You want to convert ASCII text
to HTML. For example, you have mail you want to display intelligently
on a web page.
20.4.2. Solution
Use the simple little encoding filter in Example 20-3.
Example 20-3. text2html
#!/usr/bin/perl -w -p00
# text2html - trivial html encoding of normal text
# -p means apply this script to each record.
# -00 mean that a record is now a paragraph
use HTML::Entities;
$_ = encode_entities($_, "\200-\377");
if (/^\s/) {
# Paragraphs beginning with whitespace are wrapped in <PRE>
s{(.*)$} {<PRE>\n</PRE>\n}s; # indented verbatim
} else {
s{^(>.*)} {<BR>}gm; # quoted text
s{<URL:(.*?)>} {<A HREF="></A>}gs # embedded URL (good)
||
s{(http:\S+)} {<A HREF="></A>}gs; # guessed URL (bad)
s{*(\S+)*} {<STRONG></STRONG>}g; # this is *bold* here
s{\b_(\S+)\_\b} {<EM></EM>}g; # this is _italics_ here
s{^} {<P>\n}; # add paragraph tag
}
20.4.3. Discussion
Converting arbitrary plain text to HTML has no general solution
because there are too many conflicting ways to represent formatting
information. The more you know about the input, the better you can
format it.For example, if you knew that you would be fed a mail message, you
could add this block to format the mail headers:BEGIN {
print "<TABLE>";
$_ = encode_entities(scalar <>);
s/\n\s+/ /g; # continuation lines
while ( /^(\S+?:)\s*(.*)$/gm ) { # parse heading
print "<TR><TH ALIGN='LEFT'></TH><TD>$2</TD></TR>\n";
}
print "</TABLE><HR>";
}
The CPAN module HTML::TextToHTML has options for headers, footers,
indentation, tables, and more.
20.4.4. See Also
The documentation for the CPAN modules HTML::Entities and
HTML::TextToHTML
![]() | ![]() | ![]() |
20.3. Extracting URLs | ![]() | 20.5. Converting HTML to ASCII |

Copyright © 2003 O'Reilly & Associates. All rights reserved.