Hack 31. Search for Special Characters

characters in URLs .Google can find lots of different
things, but at the time of this writing, it can't
find special charactersexcept for $,
recently added for use in number range searches
["Google Web Search Basics" and
"Number Range" in Chapter 1]. That's a shame, because
special characters can come in handy. The tilde
(~), for example, denotes personal web pages.This hack takes a query from a form, pulls results from Google, and
filters the results for the presence of several different special
characters in the URL, including the tilde.Why would you want to do this? By altering this hack slightly (see
the "Hacking the Hack" section),
you could restrict your searches to just pages with a tilde in the
URL, an easy way to find personal pages. Maybe
you're looking for dynamically generated pages with
a question mark (?) in the URL; you
can't find these using Google by itself, but you can
with this hack. And, of course, you can turn the hack inside-out and
not return results containing ~,
?, or other special characters. In fact, this code
is more of a beginning than an end unto itself: you can tweak it in
several different ways to do several different things.
2.13.1. The Code
Save this code to a text file called
aunt_tilde.cgi. Replace insert key
here with your Google API key. #!/usr/local/bin/perl
# aunt_tilde.pl
# Finding special characters in Google result URLs.
# Your Google API developer's key.
my $google_key='insert key here';
# Number of times to loop, retrieving 10 results at a time.
my $loops = 10;
# Location of the GoogleSearch WSDL file.
my $google_wdsl = "./GoogleSearch.wsdl";
use strict;
use CGI qw/:standard/;
use SOAP::Lite;
header( ),
start_html("Aunt Tilde"),
h1("Aunt Tilde"),
start_form(-method=>'GET'),
'Query: ', textfield(-name=>'query'),
br( ),
'Characters to find: ',
checkbox_group(
-name=>'characters',
-values=>[qw/ ~ @ ? ! /],
-defaults=>[qw/ ~ /]
),
br( ),
submit(-name=>'submit', -value=>'Search'),
end_form( ), p( );
if (param('query')) {
# Create a regular expression to match preferred special characters.
my $special_regex = '[\\' . join('\\', param('characters')) . ']';
my $google_search = SOAP::Lite->service("file:$google_wdsl");
for (my $offset = 0; $offset <= $loops*10; $offset += 10) {
my $results = $google_search ->
doGoogleSearch(
$google_key, param('query'), $offset, 10, "false", ", "false",
", "latin1", "latin1"
);
last unless @{$results->{resultElements}};
foreach my $result (@{$results->{'resultElements'}}) {
# Output only matched URLs, highlighting special characters in red
my $url = $result->{URL};
$url =~ s!($special_regex)!<font color="red">$1</font>!g and
p(
b(a({href='no title')), br( ),
$url, br( ),
i($result->{snippet}||'no snippet')
);
}
}
print end_html;
}
2.13.2. Running the Hack
Point your browser at the aunt_tilde.cgi CGI
script, type a search query into the Query field, click the
checkboxes next to the special characters you're
after, and click the Search button.
2.13.3. Hacking the Hack
There are a couple of interesting ways to change this hack.
2.13.3.1 Choosing special characters
You can easily alter the list of special characters that
you're interested in by changing one line in the
script: -values=>[qw/ ~ @ ? ! /], Simply add or remove special characters from the space-delimited list
between the / (forward slash) characters. If, for
example, you want to add & (ampersands) and
z (why not?), while dropping ?
(question marks), that line of code should be: -values=>[qw/ ~ @ !& z
/],
|
2.13.3.2 Excluding special characters
You can just as easily decide to exclude URLs that contain
your special characters as include them. Simply change the
=~ (read: does match) in this line: $url =~ s!($special_regex)!<font color="red">$1</font>!g and to !~ (read: does not match),
leaving: $url !
~ s!($special_regex)!<font color="red">$1</font>!g and Now, any result containing the specific characters will
not show up.