Google Hacks 2Nd Edition [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Google Hacks 2Nd Edition [Electronic resources] - نسخه متنی

Tara Calishain

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید







Hack 23. Build Google Directory URLs

Use ODP category information to build URLs for
the Google Directory .

The Google
Directory (http://directory.google.com) overlays the
Open Directory Project (ODP or
DMOZ,
http://www.dmoz.org) ontology
onto the Google core index. The result is a Yahoo!-like directory
hierarchy of search results and their associated categories with the
added magic of Google's popularity algorithms.

The ODP opens its entire database of listings to
anybodyprovided you're willing to download a
283 MB file (and that's compressed!). While
you're probably not interested in all the individual
listings, you might want particular ODP categories, or you may be
interested in watching new listings flowing into certain categories.

Unfortunately, the ODP does not offer a way to search by keyword
sites added within a recent time period. So instead of searching for
recently added sites, the best way to get new site information from
the ODP is to monitor categories.

Because the Google Directory builds its directory based on the ODP
information, you can use the ODP category hierarchy information to
generate Google Directory
URLs. This hack
searches the ODP category hierarchy information for keywords that you
specify, and then builds Google Directory URLs and checks to make
sure that they're active.

You'll need to download the category hierarchy
information from the ODP to get this hack to work. The compressed
file containing this information is available from http://dmoz.org/rdfl, and the specific
file is here: http://dmoz.org/rdf/structure.rdf.u8.gz.
Before using it, you must uncompress it with a decompression
application specific to your operating system. In the Unix
environment, the command looks something like this:

% gunzip structure.rdf.u8.gz


Bear in mind that the full category hierarchy is over 35 MB. If you
just want to experiment with the structure, you can get an excerpt
from http://dmoz.org/rdf/structure.example.txt.
This version is a plain text file and does not require uncompressing.


2.5.1. The Code


Save the following code to a text file called
google_dir.pl:

#!/usr/bin/perl
# google_dir.pl
# Uses ODP category information to build URLs into the Google Directory.
# Usage: perl google_dir.pl "keywords" < structure.rdf.u8
use strict;
use LWP::Simple;
# Turn off output buffering.
$|++;
my $directory_url = "http://directory.google.com";
@ARGV == 1
or die qq{usage: perl google_dir.pl "{query}" < structure.rdf.u8\n};
# Grab those command-line specified keywords and build a regular expression.
my $keywords = shift @ARGV;
$keywords =~ s!\s+!\|!g;
# A place to store topics.
my %topics;
# Loop through the DMOZ category file, printing matching results.
while (<>) {
/"(Top\/.*$keywords.*)"/i and !$topics{$1}++
and print "$directory_url/$1\n";
}

2.5.2. Running the Hack


Run the script from the command line ["How to Run
the Hacks" in the Preface], along with a query and
the piped-in contents of the DMOZ category file:

% perl googledir.pl "keywords" < structure.rdf.u8 Replace keywords with the particular
keywords that you're after.

If you're using the shorter category excerpt
structure.example.txt, use this:

% perl googledir.pl "keywords" < structure.example.txt

2.5.3. The Results


Feeding the keyword mosaic into this hack would
look something like this:

% perl googledir.pl "mosaic" < structure.rdf.u8
http://directory.google.com/Top/Arts/Crafts/Mosaics
http://directory.google.com/Top/Arts/Crafts/Mosaics/Glass
http://directory.google.com/Top/Arts/Crafts/Mosaics/Ceramic_and_Broken_China
http://directory.google.com/Top/Arts/Crafts/Mosaics/Associations_and_Directories
http://directory.google.com/Top/Arts/Crafts/Mosaics/Stone
http://directory.google.com/Top/Shopping/Crafts/Mosaics
http://directory.google.com/Top/Shopping/Crafts/Supplies/Mosaics
...


2.5.4. Hacking the Hack


There isn't much hacking that you can do to this
hack; it's designed to take ODP data, create Google
URLs, and verify those URLs. How well you can get this to work for
you really depends on the types of search words that you choose.

Choose words that are more general. If you're
interested in a particular state in the U.S., for example, choose the
name of the state and major cities, but don't choose
the name of a very small town or of the governor. Choose the name of
a company but not of its CFO. A good rule of thumb is to choose the
keywords that you might find as entry names in an encyclopedia or
almanac. You can easily imagine finding a company name as an
encyclopedia entry, but it's a rare CFO who could
achieve the same.


/ 209