Google Hacks 2Nd Edition [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Google Hacks 2Nd Edition [Electronic resources] - نسخه متنی

Tara Calishain

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید







<>

Hack 32. Dig Deeper into Sites

Dig deeper into the hierarchies of web sites
matching your search criteria .

One of Google's big strengths is that it can find
your search term instantly and with great precision. But sometimes
you're not interested so much in one definitive
result as in lots of diverse results; maybe you even want some that
are a bit more on the obscure side.
<> One method I've found rather useful is to ignore all
results shallower than a particular level in a
site's directory hierarchy. You avoid all the
clutter of finds on home pages and go for subject matter otherwise
hidden away in the depths of a site's structure.
While content comes and goes, ebbs and flows from a
site's main focus, it tends to gather in more
permanent locales, categorized and archived, like with like.

This script asks for a query along with a preferred depth, above
which results are thrown out. Specify a depth of four and your
results will come only from
http://example.com/b/c/d<>, not /b/<>, or /b/c<>.

Because you're already limiting the kinds of results
that you see, it's best to use more common words for
what you're looking for. Obscure query terms can
often return absolutely no results.


The default number of loops, retrieving 10 items apiece, is set to
50. This is to assure that you glean some decent number of results
because many will be tossed. You can, of course, alter this number,
but bear in mind that you're using that number of
your daily quota of 1,000 Google API queries per
developer's key.

<>

2.14.1. The Code


Save this code as deep_blue_g.cgi, a CGI script
["How to Run the Hacks" in the
Preface] on your web server. As you type it in, replace
insert key here with your Google API key.

#!/usr/local/bin/perl
# deep_blue_g.cgi
# Limiting search results to a particular depth in a web
# site's hierarchy.
# deep_blue_g.cgi is called as a CGI with form input.
# Your Google API developer's key.
my $google_key='insert key here';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
# Number of times to loop, retrieving 10 results at a time.
my $loops = 10;
use SOAP::Lite;
use CGI qw/:standard *table/;
print
header( ),
start_html("Fishing in the Deep Blue G"),
h1("Fishing in the Deep Blue G"),
start_form(-method=>'GET'),
'Query: ', textfield(-name=>'query'),
br( ),
'Depth: ', textfield(-name=>'depth', -default=>4),
br( ),
submit(-name=>'submit', -value=>'Search'),
end_form( ), p( );
# Make sure a query and numeric depth are provided.
if (param('query') and param('depth') =~ /\d+/) {
# Create a new SOAP object.
my $google_search = SOAP::Lite->service("file:$google_wdsl");
for (my $offset = 0; $offset <= $loops*10; $offset += 10) {
my $results = $google_search ->
doGoogleSearch(
$google_key, param('query'), $offset, 10, "false", ", "false",
", "latin1", "latin1"
);
last unless @{$results->{resultElements}};
foreach my $result (@{$results->{'resultElements'}}) {
# Determine depth.
my $url = $result->{URL};
$url =~ s!^\w+://|/$!!g;
# Output only those deep enough.
( split(/\//, $url) - 1) >= param('depth') and
print
p(
b(a({href='no title')), br( ),
$result->{URL}, br( ),
i($result->{snippet}||'no snippet')
);
}
}
print end_html;
}
<>

2.14.2. Running the Hack


This hack runs as a CGI script. Point your browser at
deep_blue_g.cgi, fill out the query and depth
fields, and click the Submit button.

Figure 2-8<> shows a query for "Jacques
Cousteau
", restricting results to a depth of six;
that's six levels down from the
site's home page. You'll notice
some pretty long URLs in there.

<>


Figure 2-8. A search for "Jacques Cousteau", restricting results to six levels down


<>

2.14.3. Hacking the Hack


Perhaps you're interested in just the opposite of
what this hack provides: you want only results from higher up in a
site's hierarchy. Hacking this hack is simple
enough: swap in a < (less than) symbol instead
of the > (greater than) in the following line:
<> ( split(/\//, $url) - 1) <= param('depth') and

/ 209