Perl Cd Bookshelf [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Perl Cd Bookshelf [Electronic resources] - نسخه متنی

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید



20.1. Fetching a URL from a Perl Script


20.1.1. Problem



You have a URL whose contents you want
to fetch from a script.

20.1.2. Solution


Use the get function from the CPAN module
LWP::Simple, part of LWP.

use LWP::Simple;
$content = get($URL);

20.1.3. Discussion


The right library makes life easier, and the LWP modules are the
right ones for this task. As you can see from the Solution, LWP makes
this task a trivial one.

The get function from LWP::Simple returns
undef on error, so check for errors this way:

use LWP::Simple;
unless (defined ($content = get $URL)) {
die "could not get $URL\n";
}

When called that way, however, you can't determine the cause of the
error. For this and other elaborate processing, you'll have to go
beyond LWP::Simple.

Example 20-1 is a program that fetches a remote
document. If it fails, it prints out the error status line.
Otherwise, it prints out the document title and the number of bytes
of content. We use three modules, two of which are from LWP.

LWP::UserAgent
This module creates a virtual browser. The object returned from the
new constructor is used to make the actual request. We've set the
name of our agent to "Schmozilla/v9.14 Platinum" just to give the
remote webmaster browser-envy when they see it in their logs. This is
useful on obnoxious web servers that needlessly consult the user
agent string to decide whether to return a proper page or an
infuriating "you need Internet Navigator v12 or later to view this
site" cop-out.

HTTP::Response
This is the object type returned when the user agent actually runs
the request. We check it for errors and contents.

URI::Heuristic
This curious little module uses Netscape-style guessing algorithms to
expand partial URLs. For example:



















Simple


Guess


perl


http://www.perl.com


www.oreilly.com


http://www.oreilly.com


ftp.funet.fi


file:/etc/passwd

Although the simple forms listed aren't legitimate URLs (their format
is not in the URI specification), Netscape tries to guess the URLs
they stand for. Because Netscape does it, most other browsers do,
too.

The source is in Example 20-1.

Example 20-1. titlebytes


#!/usr/bin/perl -w
# titlebytes - find the title and size of documents
use strict;
use LWP::UserAgent;
use HTTP::Response;
use URI::Heuristic;
my $raw_url = shift or die "usage: $0 url\n";
my $url = URI::Heuristic::uf_urlstr($raw_url);
$| = 1; # to flush next line
printf "%s =>\n\t", $url;
# bogus user agent
my $ua = LWP::UserAgent->new( );
$ua->agent("Schmozilla/v9.14 Platinum"); # give it time, it'll get there
# bogus referrer to perplex the log analyzers
my $response = $ua->get($url, Referer => "http://wizard.yellowbrick.oz");
if ($response->is_error( )) {
printf " %s\n", $response->status_line;
} else {
my $content = $response->content( );
my $bytes = length $content;
my $count = ($content =~ tr/\n/\n/);
printf "%s (%d lines, %d bytes)\n",
$response->title( ) || "(no title)", $count, $bytes;
}

When run, the program produces output like this:

% titlebytes http://www.tpj.com/
http://www.tpj.com/ =>
The Perl Journal (109 lines, 4530 bytes)

Yes, "referer" is not how "referrer" should be spelled. The standards
people got it wrong when they misspelled HTTP_REFERER. Please use
double r's when referring to things in English.

The first argument to the get method is the URL,
and subsequent pairs of arguments are headers and their values.

20.1.4. See Also


The documentation for the CPAN module LWP::Simple, and the
lwpcook(1) and lwptut(1)
manpages that came with LWP; the documentation for the modules
LWP::UserAgent, HTTP::Response, and URI::Heuristic;
Recipe 20.2 and Perl &
LWP



20. Web Automation20.2. Automating Form Submission




Copyright © 2003 O'Reilly & Associates. All rights reserved.

/ 875