Java Examples In A Nutshell (3rd Edition) [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

Java Examples In A Nutshell (3rd Edition) [Electronic resources] - نسخه متنی

O'Reilly Media, Inc

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید










6.3 Regular Expressions and Character Decoding


Example 6-3 demonstrates the text-matching
capabilities of the
java.util.regex package. This BGrep
class is a variant of the Unix
"grep" command for searching files
for text that matches a given regular expression. Unlike
Unix grep, which is
line-oriented, BGrep is block-oriented: the
matched text can span multiple lines, and its location in the file is
indicated by character number rather than line number. Invoke
BGrep with the regular expression to search for
and one or more filenames. Use -i to specify
case-insensitive matching. If the files contain characters in some
encoding other than UTF-8, use the -e option to
specify the encoding. For example, you could use this command to
search a bunch of Java source files for occurrences of
"ByteBuffer",
"CharBuffer", and the like.

java je3.nio.BGrep '[A-Z][a-z]*Buffer' *.java

The java.util.regex package uses a regular
expression syntax that is much like that of Perl 5. Look up
java.util.regex.Pattern in Sun's
javadocs or in Java in a Nutshell for a summary
of this syntax, and look up the Matcher class in
the same package for details on how to use Pattern
objects to match character sequences. If you are not already familiar
with regular expressions, you can find complete details in the book
Mastering Regular Expressions, by Jeffrey Friedl
(O'Reilly).

This program also demonstrates an easy way to read the contents of a
file: simply use the memory-mapping capabilities of
FileChannel to map the contents of the entire
file into a ByteBuffer. In order to perform
pattern matching on the characters in a file, the bytes of the file
must be decoded into characters; this example uses a simple
Charset method to decode a complete
ByteBuffer into a newly allocated
CharBuffer all at once. This
CharBuffer is then used with a
java.util.regex.Matcher object to look for pattern
matches. Later examples in this chapter will illustrate lower-level
character decoding techniques.

Example 6-3. BGrep.java

package je3.nio;
import java.io.*;
import java.nio.*;
import java.nio.charset.*;
import java.nio.channels.*;
import java.util.regex.*;
/**
* BGrep: a regular expression search utility, like Unix grep, but
* block-oriented instead of line-oriented. For any match found, the
* filename and character position within the file (note: not the line
* number) are printed along with the text that matched.
*
* Usage:
* java je3.nio.BGrep [options] <pattern> <files>...
*
* Options:
* -e <encoding> specifies and encoding. UTF-8 is the default
* -i enables case-insensitive matching. Use -s also for non-ASCII text
* -s enables strict (but slower) processing of non-ASCII characters
*
* This program requires that each file to be searched fits into main
* memory, and so does not work with extremely large files.
**/
public class BGrep {
public static void main(String[ ] args) {
String encodingName = "UTF-8"; // Default to UTF-8 encoding
int flags = Pattern.MULTILINE; // Default regexp flags
try { // Fatal exceptions are handled after this try block
// First, process any options
int nextarg = 0;
while(args[nextarg].charAt(0) == '-') {
String option = args[nextarg++];
if (option.equals("-e")) {
encodingName = args[nextarg++];
}
else if (option.equals("-i")) { // case-insensitive matching
flags |= Pattern.CASE_INSENSITIVE;
}
else if (option.equals("-s")) { // Strict Unicode processing
flags |= Pattern.UNICODE_CASE; // case-insensitive Unicode
flags |= Pattern.CANON_EQ; // canonicalize Unicode
}
else {
System.err.println("Unknown option: " + option);
usage( );
}
}
// Get the Charset for converting bytes to chars
Charset charset = Charset.forName(encodingName);
// Next argument must be a regexp. Compile it to a Pattern object
Pattern pattern = Pattern.compile(args[nextarg++], flags);
// Require that at least one file is specified
if (nextarg == args.length) usage( );
// Loop through each of the specified filenames
while(nextarg < args.length) {
String filename = args[nextarg++];
CharBuffer chars; // This will hold complete text of the file
try { // Handle per-file errors locally
// Open a FileChannel to the named file
FileInputStream stream = new FileInputStream(filename);
FileChannel f = stream.getChannel( );
// Memory-map the file into one big ByteBuffer. This is
// easy but may be somewhat inefficient for short files.
ByteBuffer bytes = f.map(FileChannel.MapMode.READ_ONLY,
0, f.size( ));
// We can close the file once it is is mapped into memory.
// Closing the stream closes the channel, too.
stream.close( );
// Decode the entire ByteBuffer into one big CharBuffer
chars = charset.decode(bytes);
}
catch(IOException e) { // File not found or other problem
System.err.println(e); // Print error message
continue; // and move on to the next file
}
// This is the basic regexp loop for finding all matches in a
// CharSequence. Note that CharBuffer implements CharSequence.
// A Matcher holds state for a given Pattern and text.
Matcher matcher = pattern.matcher(chars);
while(matcher.find( )) { // While there are more matches
// Print out details of the match
System.out.println(filename + ":" + // file name
matcher.start( )+": "+ // character pos
matcher.group( )); // matching text
}
}
}
// These are the things that can go wrong in the code above
catch(UnsupportedCharsetException e) { // Bad encoding name
System.err.println("Unknown encoding: " + encodingName);
}
catch(PatternSyntaxException e) { // Bad pattern
System.err.println("Syntax error in search pattern:\n" +
e.getMessage( ));
}
catch(ArrayIndexOutOfBoundsException e) { // Wrong number of arguments
usage( );
}
}
/** A utility method to display invocation syntax and exit. */
public static void usage( ) {
System.err.println("Usage: java BGrep [-e <encoding>] [-i] [-s]" +
" <pattern> <filename>...");
System.exit(1);
}
}


/ 285