8.3 Character Encodings
Text representation has
traditionally been one of the most difficult problems of
internationalization. Java, however, solves this problem quite
elegantly and hides the difficult issues. Java uses Unicode
internally, so it can represent essentially any character in any
commonly used written language. As I noted earlier, the remaining
task is to convert Unicode to and from locale-specific encodings.
Java includes quite a few internal byte-to-char and char-to-byte
converters that handle converting locale-specific character encodings
to Unicode and vice versa. Although the converters themselves are not
public, they are accessible through the
InputStreamReader and
OutputStreamWriter classes, which are character
streams included in the java.io package.Any program can automatically handle locale-specific encodings simply
by using these character stream classes to do their textual input and
output. Note that the FileReader and
FileWriter classes use these streams to
automatically read and write text files that use the
platform's default encoding.Example 8-2
shows a simple program that works with character encodings. It
converts a file from one specified encoding to another by converting
from the first encoding to Unicode and then from Unicode to the
second encoding. Note that most of the program is taken up with the
mechanics of parsing argument lists, handling exceptions, and so on.
Only a few lines are required to create the
InputStreamReader and
OutputStreamWriter classes that perform the two
halves of the conversion. Also note that exceptions are handled by
calling LocalizedError.display( ). This method is
not part of the Java API; it is a custom method shown in Example 8-5 at the end of this chapter.
Example 8-2. ConvertEncoding.java
package je3.i18n;
import java.io.*;
/** A program to convert from one character encoding to another */
public class ConvertEncoding {
public static void main(String[ ] args) {
String from = null, to = null;
String infile = null, outfile = null;
for(int i = 0; i < args.length; i++) { // Parse command-line arguments.
if (i == args.length-1) usage( ); // All args require another.
if (args[i].equals("-from")) from = args[++i];
else if (args[i].equals("-to")) to = args[++i];
else if (args[i].equals("-in")) infile = args[++i];
else if (args[i].equals("-out")) outfile = args[++i];
else usage( );
}
try { convert(infile, outfile, from, to); } // Attempt conversion.
catch (Exception e) { // Handle exceptions.
LocalizedError.display(e); // Defined at the end of this chapter.
System.exit(1);
}
}
public static void usage( ) {
System.err.println("Usage: java ConvertEncoding <options>\n" +
"Options:\n\t-from <encoding>\n\t" +
"-to <encoding>\n\t" +
"-in <file>\n\t-out <file>");
System.exit(1);
}
public static void convert(String infile, String outfile,
String from, String to)
throws IOException, UnsupportedEncodingException
{
// Set up byte streams.
InputStream in;
if (infile != null) in = new FileInputStream(infile);
else in = System.in;
OutputStream out;
if (outfile != null) out = new FileOutputStream(outfile);
else out = System.out;
// Use default encoding if no encoding is specified.
if (from == null) from = System.getProperty("file.encoding");
if (to == null) to = System.getProperty("file.encoding");
// Set up character streams.
Reader r = new BufferedReader(new InputStreamReader(in, from));
Writer w = new BufferedWriter(new OutputStreamWriter(out, to));
// Copy characters from input to output. The InputStreamReader
// converts from the input encoding to Unicode, and the
// OutputStreamWriter converts from Unicode to the output encoding.
// Characters that cannot be represented in the output encoding are
// output as '?'
char[ ] buffer = new char[4096];
int len;
while((len = r.read(buffer)) != -1) // Read a block of input.
w.write(buffer, 0, len); // And write it out.
r.close( ); // Close the input.
w.close( ); // Flush and close output.
}
}
 لطفا منتظر باشید ...
        لطفا منتظر باشید ...
     
                     
                
                