Perl Cd Bookshelf [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Perl Cd Bookshelf [Electronic resources] - نسخه متنی

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید

6.12. Honoring Locale Settings in Regular Expressions


6.12.1. Problem



You want to translate case when in a
different locale, or you want to make \w match
letters with diacritics, such as José
or déjà vu.

For example, let''s say you''re given half a gigabyte of text written
in German and told to index it. You want to extract words (with
\w+) and convert them to lowercase (with
lc or \L), but the normal
versions of \w and lc neither
match the German words nor change the case of accented letters.

6.12.2. Solution



Perl''s
regular-expression and text-manipulation routines have hooks to the
POSIX locale setting. Under the use
locale pragma, accented characters are taken care
of—assuming a reasonable LC_CTYPE
specification and system support for the same.

use locale;

6.12.3. Discussion


By default, \w+ and case-mapping functions operate
on upper- and lowercase letters, digits, and underscores. This works
only for the simplest of English words, failing even on many common
imports. The use locale
directive redefines what a "word character" means.

In Example 6-7 you see the difference in output
between having selected the English ("en") locale and the German
("de") one.

Example 6-7. localeg


  #!/usr/bin/perl -w
# localeg - demonstrate locale effects
use locale;
use POSIX ''locale_h'';
$name = "andreas k\xF6nig";
@locale{qw(German English)} = qw(de_DE.ISO_8859-1 us-ascii);
setlocale(LC_CTYPE, $locale{English})
or die "Invalid locale $locale{English}";
@english_names = ( );
while ($name =~ /\b(\w+)\b/g) {
push(@english_names, ucfirst($1));
}
setlocale(LC_CTYPE, $locale{German})
or die "Invalid locale $locale{German}";
@german_names = ( );
while ($name =~ /\b(\w+)\b/g) {
push(@german_names, ucfirst($1));
}
print "English names: @english_names\n";
print "German names: @german_names\n";
English names: Andreas K Nig
German names: Andreas König

This approach relies on POSIX locale support for 8-bit character
encodings, which your system may or may not provide. Even if your
system does claim to provide POSIX locale support, the standard does
not specify the locale names. As you might guess, portability of this
approach is not assured. If your data is already in Unicode, you
don''t need POSIX locales for this to work.

6.12.4. See Also


The treatment of \b, \w, and
\s in perlre(1) and in the
"Classic Perl Character Class Shortcuts" section of Chapter 5 of
Programming Perl; the treatment of locales in
Perl in perllocale(1); your system''s
locale(3) manpage; we discuss locales in greater
depth in Recipe 6.2; the "POSIX—An
Attempt at Standardization" section of Chapter 3 of
Mastering Regular Expressions

/ 875