Perl Cd Bookshelf [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Perl Cd Bookshelf [Electronic resources] - نسخه متنی

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید



1.22. Soundex Matching


1.22.1. Problem



You have two English surnames and
want to know whether they sound somewhat similar, regardless of
spelling. This would let you offer users a "fuzzy search" of names in
a telephone book to catch "Smith" and "Smythe" and others within the
set, such as "Smite" and
"Smote".

1.22.2. Solution


Use
the standard Text::Soundex module:

use Text::Soundex;
$CODE = soundex($STRING);
@CODES = soundex(@LIST);

Or use the CPAN module
Text::Metaphone:

use Text::Metaphone;
$phoned_words = Metaphone('Schwern');

1.22.3. Discussion


The soundex algorithm hashes words (particularly English surnames)
into a small space using a simple model that approximates an English
speaker's pronunciation of the words. Roughly speaking, each word is
reduced to a four-character string. The first character is an
uppercase letter; the remaining three are digits. By comparing the
soundex values of two strings, we can guess whether they sound
similar.

The following program prompts for a name and looks for similarly
sounding names from the password file. This same approach works on
any database with names, so you could key the database on the soundex
values if you wanted to. Such a key wouldn't be unique, of course.

use Text::Soundex;
use User::pwent;
print "Lookup user: ";
chomp($user =<STDIN>);
exit unless defined $user;
$name_code = soundex($user);
while ($uent = getpwent( )) {
($firstname, $lastname) = $uent->gecos =~ /(\w+)[^,]*\b(\w+)/;
if ($name_code eq soundex($uent->name) ||
$name_code eq soundex($lastname) ||
$name_code eq soundex($firstname) )
{
printf "%s: %s %s\n", $uent->name, $firstname, $lastname;
}
}

The
Text::Metaphone module from CPAN addresses the same problem in a
different and better way. The soundex function
returns a letter and a three-digit code that maps just the beginning
of the input string, whereas Metaphone returns a
code as letters of variable length. For example:

soundex metaphone
Christiansen C623 KRSXNSN
Kris Jenson K625 KRSJNSN
Kyrie Eleison K642 KRLSN
Curious Liaison C624 KRSLSN

To
get the most of Metaphone, you should also use the String::Approx
module from CPAN, described more fully in
Recipe 6.13. It allows for there to be errors in the match
and still be successful. The edit distance is
the number of changes needed to go from one string to the next. This
matches a pair of strings with an edit distance of two:

if (amatch("string1", [2], "string2") { }

There's also an
adist function that reports the edit distance. The
edit distance between "Kris Jenson" "Christiansen" is 6, but between
their Metaphone encodings is only 1. Likewise, the distance between
the other pair is 8 originally, but down to 1 again if you compare
Metaphone encodings.

use Text::Metaphone qw(Metaphone);
use String::Approx qw(amatch);
if (amatch(Metaphone($s1), [1], Metaphone($s1)) {
print "Close enough!\n";
}

This would successfully match both of our example pairs.

1.22.4. See Also


The documentation for the standard Text::Soundex and User::pwent
modules; the Text::Metaphone and String::Approx modules from CPAN;
your system's passwd(5) manpage; Volume 3,
Chapter 6 of The Art of Computer Programming, by
Donald E. Knuth (Addison-Wesley)



1.21. Constant Variables1.23. Program: fixstyle




Copyright © 2003 O'Reilly & Associates. All rights reserved.

/ 875