Hack 90 Spellcheck All Your Auctions
spelled correctly.The success of any auction is
largely due to how readily it can be found in eBay searches. As
described in Chapter 2, eBay searches show only
exact matches (with very few exceptions), which means, among other
things, that spelling most definitely counts.Neither eBay's Sell Your Item form nor Turbo Lister
supports spellchecking of any kind. So it's left to
sellers to scrutinize their titles and auction descriptions, and to
obnoxious bidders to point out any mistakes. Once again, the API
comes to the rescue.
8.10.1 The Script
The following script requires the following modules and programs:
Module/program name | Available at |
|---|---|
HTML::FormatText (by Sean M. Burke) | search.cpan.org/perldoc?HTML::FormatText |
HTML::TreeBuilder (by Sean M. Burke) | search.cpan.org/perldoc?HTML::TreeBuilder |
HTML::Entities (by Gisle Aas) | search.cpan.org/perldoc?HTML::Entities |
Lingua::Ispell (by John Porter) | search.cpan.org/perldoc?Lingua::Ispell |
ispell program (by Geoff Kuenning) | fmg-www.cs.ucla.edu/geoff/ispelll |
#!/usr/bin/perl
require 'ebay.pl';
require HTML::TreeBuilder;
require HTML::FormatText;
use Lingua::Ispell qw( spellcheck );
Lingua::Ispell::allow_compounds(1);
$out1 = ";
$outall = ";
$numchecked = 0;
$numfound = 0;
$today = &formatdate(time);
$yesterday = &formatdate(time - 86400);
my $page_number = 1;
PAGE:
while (1) {
my $rsp = call_api({ Verb => 'GetSellerList', [1]
DetailLevel => 0,
UserId => $user_id,
StartTimeFrom => $yesterday,
StartTimeTo => $today,
PageNumber => $page_number
});
if ($rsp->{Errors}) {
print_error($rsp);
last PAGE;
}
foreach (@{$rsp->{SellerList}{Item}}) {
my %i = %$_;
$id = @i{qw/Id/};
if (! -e "$localdir/$id") {
my $rsp = call_api({ Verb => 'GetItem',
DetailLevel => 2,
Id => $id
});
if ($rsp->{Errors}) {
print_error($rsp)
} else {
my %i = %{$rsp->{Item}[0]};
my ($title, $description) = @i{qw/Title Description/};
$spellthis = $title . " " . $description; [2]
$tree = HTML::TreeBuilder->new_from_content($spellthis); [3]
$formatter = HTML::FormatText->new();
$spellthat = $formatter->format($tree);
$tree = $tree->delete; [4]
for my $r ( spellcheck( $spellthat ) ) { [5]
if ( $r->{'type'} eq 'miss' ) {
$out1 = $out1."'$r->{'term'}'";
$out1 = $out1." - near misses: @{$r->{'misses'}}\n";
$numfound++;
}
elsif ( $r->{'type'} eq 'guess' ) {
$out1 = $out1."'$r->{'term'}'";
$out1 = $out1." - guesses: @{$r->{'guesses'}}\n";
$numfound++;
}
elsif ( $r->{'type'} eq 'none' ) {
$out1 = $out1."'$r->{'term'}'";
$out1 = $out1." - no match.\n";
$numfound++;
}
}
$numchecked++;
if ($out1 ne ") {
$outall = $outall."Errors in #$id '$title':\n";
$outall = $outall."$out1\n\n";
$out1 = ";
}
}
}
}
last PAGE unless $rsp->{SellerList}{HasMoreItems};
$page_number++;
}
print "$numfound spelling errors found in $numchecked auctions:\n\n"; [6]
print "$outall\n";
This script is based on the one in [Hack #87], but has a few important additions
and changes.First, instead of listing recently completed auctions, the
GetSellerList API call (line [1]) is used to retrieve auctions that have
started in the last 24 hours. This will work perfectly if the script
is run every 24 hours, say, at 3:00 P.M. every day, as described in
[Hack #17].Second, since we want the auction descriptions, we need to use the
GetItem API call for each auction we spellcheck.
This means that spellchecking a dozen auctions will require 13 API
calls: one call to retrieve the list, and one for each auction.The code actually responsible for performing spellcheck starts on
line [2], where the title and description
are concatenated into a single variable,
$spellthis, so that only one spellcheck is
necessary for each auction. Next, the
HTML::FormatText module is used (lines [3] to [4]) to convert
any HTML-formatted text to plain text.Finally, the Lingua::Ispell module [5] uses the external ispell
program to perform a spellcheck on $spellthat (the
cleaned-up version of $spellthis). As errors are
found, suggestions are recorded into the $out1
variable, which is merged with $outall and
displayed when the spellcheck is complete.
8.10.2 Hacking the Hack
Here are a few things you might want to do with this script:Instead of simply printing out the results of the spellcheck, as the
script does on line [6], you can quite
easily have the results emailed to you. See [Hack #93] for an example.Currently, the script performs a spellcheck on every running auction
started in the last 24 hours. If you run the script every 24 hours,
then this won't pose a problem. But if you choose to
run the script manually and therefore specify a broader range of
dates, you may wish to include error checking to prevent the script
from needlessly checking the same auction twice.If you're especially daring, you can have the
spellchecker submit the revisions for you, although I would never
trust a spellchecker to know how to spell all the weird names of my
items.