Google Hacks 2Nd Edition [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Google Hacks 2Nd Edition [Electronic resources] - نسخه متنی

Tara Calishain

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید







Hack 43. Scattersearch with Yahoo! and Google

Sometimes, illuminating results can be found
when scraping from one site and feeding the results into the API of
another. With scattersearching, you can narrow down the most popular
related results, as suggested by Yahoo! and Google .

We've combined a scrape of a Yahoo! web page with a
Google search [Hack #41] ,
blending scraped data with data generated via a web service API to
good effect. In this hack, we're doing something
similar, except this time we're taking the results
of a Yahoo! search and blending it with a Google search.

Yahoo! has a "Related searches"
feature, where you enter a search term and get a list of related
terms under the search box, if any are available. This hack scrapes
those related terms and performs a Google search for the related
terms in the title. It then returns the count for those searches,
along with a direct link to the results. Aside from showing how
scraped and API-generated data can live together in harmony, this
hack is good to use when you're exploring concepts;
for example, you might know that something called
Pokemon exists, but you might not know anything
about it. You'll get Yahoo!'s
related searches and an idea of how many results each of those
searches generates in Google. From there, you can choose the search
terms that generate the most results or look the most promising based
on your limited knowledge, or you can simply pick a road that appears
less traveled.


2.25.1. The Code


Save the following code to a file called
scattersearch.pl.


Bear in mind that this hack, while using the Google API for the
Google portion, involves some scraping of Yahoo!'s
search pages and thus is rather brittle. If it stops working at any
point, take a gander at the regular expressions for
they're almost sure to be the breakage point.

#!/usr/bin/perl -w
#
# Scattersearch -- Use the search suggestions from
# Yahoo! to build a series of intitle: searches at Google.
use strict;
use LWP;
use SOAP::Lite;
use CGI qw/:standard/;
# Get our query, else die miserably.
my $query = shift @ARGV; die unless $query;
# Your Google API developer's key.
my $google_key = 'insert key here';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
# Search Yahoo! for the query.
my $ua = LWP::UserAgent->new;
my $url = URI->new('http://search.yahoo.com/search');
$url->query_form(rs => "more", p => $query);
my $yahoosearch = $ua->get($url)->content;
$yahoosearch =~ s/[1f1t1n1r]//isg;
# And determine if there were any results.
$yahoosearch =~ m!Also try:()  !migs;
die "Sorry, there were no results!1n" unless $1;
my $recommended = $1;
# Now, add all our results into
# an array for Google processing.
my @googlequeries;
while ($recommended =~ m!<a href=">()</a>!mgis) {
my $searchitem = $1;
$searchitem =~ s/nobr|<[^>]*>|1///g;
print "$searchitem1n";
push (@googlequeries, $searchitem);
}
# Print our header for the results page.
print join "1n",
start_html("ScatterSearch");
h1("Your Scattersearch Results"),
p("Your original search term was '$query'"),
p("That search had " . scalar(@googlequeries). " recommended terms."),
p("Here are result numbers from a Google search"),
CGI::start_ol( );
# Create our Google object for API searches.
my $gsrch = SOAP::Lite->service("file:$google_wdsl");
# Running the actual Google queries.
foreach my $googlesearch (@googlequeries) {
my $titlesearch = "allintitle:$googlesearch";
my $count = $gsrch->doGoogleSearch($google_key, $titlesearch,
0, 1, "false", ", "false",
", ", ");
my $url = $googlesearch; $url =~ s/ /+/g; $url =~ s/1"/%22/g;
print li("There were $count->{estimatedTotalResultsCount} ".
"results for the recommended search <a href=1"http://www.".
"google.com/search?q=$url&num=1001">$googlesearch</a>");
}
print CGI::end_ol( ), end_html;


2.25.2. Running the Hack


This script generates an HTML file, ready for you to upload to a
publicly accessible web site. If you want to save the output of a
search for siamese to a file called
scattersearchl in your
Sites directory, run the following command
["How to Run the Hacks" in the
Preface]:

% perl scattersearch.pl "siamese" > ~/Sites/scattersearchl Your final results, as rendered by your browser, will look similar to
Figure 2-15.


Figure 2-15. Scattersearch results for siamese


You'll have to do a little experimenting to find out
which terms have related searches. Broadly speaking, very general
search terms are bad; it's better to zero in on
terms that people would search for and that would be easy to group
together. At the time of this writing, for example,
heart has no related search terms, but
blood pressure does.


2.25.3. Hacking the Hack


You have two choices: you can either hack the interaction with Yahoo!
or expand it to include something in addition to or instead of Yahoo!
itself. Let's look at Yahoo! first. If you take a
close look at the code, you'll see
we're passing an unusual parameter to our Yahoo!
search results page:

$url->query_form(rs => "more", p => $query);

The rs=>"more" part of the search shows the
related search terms. Getting the related search this way will show
up to 10 results. If you remove that portion of the code,
you'll get roughly four related searches when
they're available. That might suit you if you want
only a few, but perhaps you want dozens and dozens! In that case,
replace more with all.

Beware, though: this can generate a lot of related searches, and it
can certainly eat up your daily allowance of Google API requests.
Tread carefully. Kevin Hemenway and Tara Calishain

/ 209