Hack 36. Search Google Topics

Google API specialty topics .Google doesn't talk about it much, but it does make specialty web
searches available. And I'm not just talking about
searches limited to a certain domain. I'm talking
about searches that are devoted to a particular topic (http://www.google.com/options/specialsearchesl).
The Google API makes four of these searches available: the U.S.
Government, Linux, BSD, and Macintosh.In this hack, we'll look at a program that takes a
query from a form and provides a count of that query in each
specialty topic, as well as a count of results for each topic. This
program runs via a form.
2.18.1. Why Topic Search?
Why would you want to topic search? Because Google currently indexes
over eight billion pages. If you try to do more than very specific
searches, you might find yourself with far too many results. If you
narrow down your search by topic, you can get good results without
having to exactly zero in on your search.You can also use it to do some decidedly unscientific research. Which
topic contains more iterations of the phrase "open
source"? Which contains the most pages from
.edu (educational) domains? Which topic,
Macintosh or FreeBSD, has more on user interfaces? Which topic holds
the most for Monty Python fans?
2.18.2. The Code
Save the following code as a CGI script ["How to Run
the Hacks" in the Preface] named
gootopic.cgi in the cgi-bin
directory on your web server: #!/usr/local/bin/perl
# gootopic.cgi
# Queries across Google Topics (and All of Google), returning
# number of results and top result for each topic.
# gootopic.cgi is called as a CGI with form input
# Your Google API developer's key.
my $google_key='insert key here';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
# Google Topics
my %topics = (
'' => 'All of Google',
unclesam => 'U.S. Government',
linux => 'Linux',
mac => 'Macintosh',
bsd => 'FreeBSD'
);
use strict;
use SOAP::Lite;
use CGI qw/:standard *table/;
# Display the query form.
header( ),
start_html("GooTopic"),
h1("GooTopic"),
start_form(-method=>'GET'),
'Query: ', textfield(-name=>'query'), ' ',
submit(-name=>'submit', -value=>'Search'),
end_form( ), p( );
my $google_search = SOAP::Lite->service("file:$google_wdsl");
# Perform the queries, one for each topic area.
if (param('query')) {
start_table({-cellpadding=>'10', -border=>'1'}),
Tr([th({-align=>'left'}, ['Topic', 'Count', 'Top Result'])]);
foreach my $topic (keys %topics) {
my $results = $google_search ->
doGoogleSearch(
$google_key, param('query'), 0, 10, "false", $topic, "false",
", "latin1", "latin1"
);
my $result_count = $results->{'estimatedTotalResultsCount'};
my $top_result = 'no results';
if ( $result_count ) {
my $t = @{$results->{'resultElements'}}[0];
$top_result =
b($t->{title}||'no title') . br( ) .
a({href= $t->{URL}}) . br( ) .
i($t->{snippet}||'no snippet');
}
# Output
print Tr([ td([
$topics{$topic},
$result_count,
$top_result
])
]);
}
end_table( ),
}
print end_html( ); Be sure to replace insert key here with
your Google API key.
2.18.3. Running the Hack
Point your web browser at gootopic.cgi.Provide a query and the script will search for your query in each
special topic area, providing you with an overall
("All of Google") count, topic area
count, and the top result for each. Figure 2-11
shows a sample run for "user
interface", with Macintosh (surprisingly) not
coming out on top.
Figure 2-11. Topic search for "user interface"

2.18.4. Search Ideas
Trying to figure out how many pages each topic finds for particular
top-level domains (e.g., .com,
.edu, .uk) is rather
interesting. You can query for
inurl:xx
site:xx, where
xx is the top-level domain
you're interested in. For example, inurl:va
site:va searches for any of the Vatican's
pages in the various topics; there aren't any.
inurl:mil site:mil finds an overwhelming number of
results in the U.S. Government special topicno surprise there.If you are in the mood for a party game, try to find the weirdest
possible searches that appear in all the special topics.