
![]() | ![]() |
1.5. Using Named Unicode Characters
1.5.1. Problem
You want to use Unicode names for fancy
characters in your code without worrying about their code points.
1.5.2. Solution
Place a use charnames at the
top of your file, then freely insert
"\N{CHARSPEC}"
escapes into your string literals.
1.5.3. Discussion
The
use charnames pragma lets you
use symbolic names for Unicode characters. These are compile-time
constants that you access with the
\N{CHARSPEC} double-quoted
string sequence. Several subpragmas are supported. The
:full subpragma grants access to the full range of
character names, but you have to write them out in full, exactly as
they occur in the Unicode character database, including the loud,
all-capitals notation. The :short subpragma gives
convenient shortcuts. Any import without a colon tag is taken to be a
script name, giving case-sensitive shortcuts for those scripts.use charnames ':full';
print "\N{GREEK CAPITAL LETTER DELTA} is called delta.\n";
Δ is called delta.
use charnames ':short';
print "\N{greek:Delta} is an upper-case delta.\n";
Δ is an upper-case delta.
use charnames qw(cyrillic greek);
print "\N{Sigma} and \N{sigma} are Greek sigmas.\n";
print "\N{Be} and \N{be} are Cyrillic bes.\n";
Σ and σ are Greek sigmas.
Б and б are Cyrillic bes.
Two
functions, charnames::viacode and
charnames::vianame, can translate between numeric
code points and the long names. The Unicode documents use the
notation U+XXXX to indicate the Unicode
character whose code point is XXXX, so we'll use
that here in our output.use charnames qw(:full);
for $code (0xC4, 0x394) {
printf "Character U+%04X (%s) is named %s\n",
$code, chr($code), charnames::viacode($code);
}
Character U+00C4 (Ä) is named LATIN CAPITAL LETTER A WITH DIAERESIS
Character U+0394 (Δ) is named GREEK CAPITAL LETTER DELTA
use charnames qw(:full);
$name = "MUSIC SHARP SIGN";
$code = charnames::vianame($name);
printf "%s is character U+%04X (%s)\n",
$name, $code, chr($code);
MUSIC SHARP SIGN is character U+266F (#)
Here's how to find the path to Perl's copy of the Unicode character
database:% perl -MConfig -le 'print "$Config{privlib}/unicore/NamesList.txt"'
/usr/local/lib/perl5/5.8.1/unicore/NamesList.txt
Read this file to learn the character names available to you.
1.5.4. See Also
The charnames(3) manpage and Chapter 31 of
Programming Perl; the Unicode Character Database
at http://www.unicode.org/
![]() | ![]() | ![]() |
1.4. Converting Between Characters and Values | ![]() | 1.6. Processing a String One Character at a Time |

Copyright © 2003 O'Reilly & Associates. All rights reserved.