Hack 37. Find the Largest Page

But how about Feeling Large?Google sorts your search results
by PageRank. Certainly makes sense. Sometimes, however, you may have
a substantially different focus in mind and want things ordered in
some other manner. Recency is one that comes to mind. Size is
another.In the same manner that Google's
"I'm Feeling
Lucky" button redirects you to the search result
with the highest PageRank, this hack sends you directly to the
largest (in kilobytes).
|
2.19.1. The Code
Save the following code as a CGI script ["How to Run
the Hacks" in the Preface] named
goolarge.cgi in your web
server's cgi-bin directory. Be
sure to replace insert key here with your
Google API key. #!/usr/local/bin/perl
# goolarge.cgi
# A take-off on "I'm Feeling Lucky," redirects the browser to the largest
# (size in K) document found in the first n results. n is set by number
# of loops x 10 results per.
# goolarge.cgi is called as a CGI with form input
# Your Google API developer's key.
my $google_key='insert key here';
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
# Number of times to loop, retrieving 10 results at a time.
my $loops = 10;
use strict;
use SOAP::Lite;
use CGI qw/:standard/;
# Display the query form.
unless (param('query')) {
header( ),
start_html("GooLarge"),
h1("GooLarge"),
start_form(-method=>'GET'),
'Query: ', textfield(-name=>'query'),
' ',
submit(-name=>'submit', -value=>"I'm Feeling Large"),
end_form( ), p( );
}
# Run the query.
else {
my $google_search = SOAP::Lite->service("file:$google_wdsl");
my($largest_size, $largest_url);
for (my $offset = 0; $offset <= $loops*10; $offset += 10) {
my $results = $google_search ->
doGoogleSearch(
$google_key, param('query'), $offset,
10, "false", ", "false", ", "latin1", "latin1"
);
@{$results->{'resultElements'}} or print p('No results'), last;
# Keep track of the largest size and its associated URL.
foreach (@{$results->{'resultElements'}}) {
substr($_->{cachedSize}, 0, -1) > $largest_size and
($largest_size, $largest_url) =
(substr($_->{cachedSize}, 0, -1), $_->{URL});
}
}
# Redirect the browser to the largest result.
print redirect $largest_url;
}
2.19.2. Running the Hack
Point your web browser at the goolarge.cgi CGI
script. Enter a query and click the
"I'm Feeling
Large" button. You'll be
transported directly to the largest page matching your
querywithin the first specified number of results (the default
is 100 results: 10 loops of 10 results apiece), that is.
2.19.3. Usage Examples
Perhaps you're looking for bibliographic information
of a famous person. You might find that a regular Google search
doesn't net you any more than a mention on a
plethora of content-light web pages. Running the same query through
this hack sometimes turns up pages with extensive bibliographies.Maybe you're looking for information about a state.
Try queries for the state name along with related information, such
as motto, capitol, or state bird.
2.19.4. Hacking the Hack
This hack isn't so much hacked as tweaked. By
changing the value assigned to the $loops variable
in my $loops = 10;, you can alter the number of
results that the script checks before redirecting you to the largest
result. Remember, the maximum number of results is the number of
loops multiplied by 10 results per loop. The default of 10 considers
the top 100 results. A $loops value of 5 would
consider only the top 50; 20, the top 200; and so forth.