20.5. Converting HTML to ASCII
20.5.1. Problem
You want to convert an HTML file into
formatted, plain ASCII. For example, you want to mail a web document
to someone.
20.5.2. Solution
If you have an external formatter like lynx,
call an external program:
$ascii = `lynx -dump $filename`;
If you want to do it within your program and don''t care about the
things that the HTML::FormatText formatter doesn''t yet handle well
(tables and frames):
use HTML::FormatText 3;
$ascii = HTML::FormatText->format_file(
$filename,
leftmargin => 0, rightmargin => 50
);
20.5.3. Discussion
These examples both assume the HTML is in a file. If your HTML is in
a variable, you need to write it to a file for
lynx to read. With HTML::FormatText, use the
format_string( )
method:
use HTML::FormatText 3;
$ascii = HTML::FormatText->format_string(
$filename,
leftmargin => 0, rightmargin => 50
);
If you use Netscape, its "Save as" option with the type set to "Text"
does the best job with tables.
20.5.4. See Also
The documentation for the CPAN modules HTML::TreeBuilder and
HTML::FormatText; your system''s lynx(1) manpage;
Recipe 20.6