Chapter 20. Web Automation - Perl Cd Bookshelf [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Perl Cd Bookshelf [Electronic resources] - نسخه متنی

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید



Chapter 20. Web Automation


Contents:

Introduction

Fetching a URL from a Perl Script

Automating Form Submission

Extracting URLs

Converting ASCII to HTML

Converting HTML to ASCII

Extracting or Removing HTML Tags

Finding Stale Links

Finding Fresh Links

Using Templates to Generate HTML

Mirroring Web Pages

Creating a Robot

Parsing a Web Server Log File

Processing Server Logs

Using Cookies

Fetching Password-Protected Pages

Fetching https:// Web Pages

Resuming an HTTP GET

Parsing HTML

Extracting Table Data

Program: htmlsub

Program: hrefsub

Robert Louis Stevenson, On some Technical Elements of
Style in Literature (1885)

The web, then, or the pattern, a web at once sensuous and logical, an
elegant and pregnant texture: that is style, that is the foundation
of the art of literature.


20.0. Introduction


Chapter 19
concentrated on responding to browser requests and producing
documents using CGI. This chapter approaches the Web from the other
side: instead of responding to a browser, you pretend to be one,
generating requests and processing returned documents. We make
extensive use of modules to simplify this process because the
intricate network protocols and document formats are tricky to get
right. By letting existing modules handle the hard parts, you can
concentrate on the interesting part—your own program.

The relevant modules can all be found under the following URL:

http://search.cpan.org/modlist/World_Wide_Web


There you'll
find modules for computing credit card checksums, interacting with
Netscape or Apache server APIs, processing image maps, validating
HTML, and manipulating MIME. The largest and most important modules
for this chapter, though, are found in the libwww-perl suite of
modules, referred to collectively as LWP.
Table 20-1 lists just a few modules included in LWP.

Table 20-1. LWP modules (continued)











































































Module name


Purpose


LWP::UserAgent


WWW user agent class


LWP::RobotUA


Develop robot applications


LWP::Protocol


Interface to various protocol schemes


LWP::Authen::Basic


Handle 401 and 407 responses


LWP::MediaTypes


MIME types configuration (text/html, etc.)


LWP::Debug


Debug logging module


LWP::Simple


Simple procedural interface for common functions


HTTP::Headers


MIME/RFC 822-style headers


HTTP::Message


HTTP-style message


HTTP::Request


HTTP request


HTTP::Response


HTTP response


HTTP::Daemon


HTTP server class


HTTP::Status


HTTP status code (200 OK, etc.)


HTTP::Date


Date-parsing module for HTTP date formats


HTTP::Negotiate


HTTP content negotiation calculation


WWW::RobotRules


Parse robots.txt files


File::Listing


Parse directory listings

The HTTP:: and LWP:: modules request documents from a server. The
LWP::Simple module offers an easy way to fetch a document. However,
the module can't access individual components of the HTTP response.
For these, use HTTP::Request, HTTP::Response, and LWP::UserAgent. We
show both sets of modules in
Recipe 20.1,
Recipe 20.2, and Recipe 20.10.

Once distributed with LWP, but now in distributions of their own, are
the HTML:: modules. These parse HTML. They provide the basis for
Recipe 20.5, Recipe 20.4,
Recipe 20.6, Recipe 20.3, Recipe 20.7, and the programs htmlsub
and hrefsub.

Recipe 20.12 gives a regular expression to
decode fields in your web server's log files and shows how to
interpret the fields. We use this regular expression and the
Logfile::Apache module in Recipe 20.13 to show
two ways of summarizing data in web server log files.

For detailed guidance on the LWP modules, see Sean Burke's
Perl & LWP (O'Reilly) This book expands on
much of this chapter, picking up where recipes such as Recipe 20.5 on converting
HTML to ASCII, Recipe
20.14
on fetching pages that use cookies, and Recipe 20.15 on fetching password-protected pages leave off.



19.13. Program: chemiserie20.1. Fetching a URL from a Perl Script




Copyright © 2003 O'Reilly & Associates. All rights reserved.

/ 875