Java Examples In A Nutshell (3rd Edition) [Electronic resources]

O'Reilly Media, Inc

نسخه متنی -صفحه : 285/ 48

3.7 Filtering Character Streams

FilterReader is an abstract class that defines a null filter; it reads characters from a specified Reader and returns them with no modification. In other words, FilterReader defines no-op implementations of all the Reader methods. A subclass must override at least the two read( ) methods to perform whatever sort of filtering is necessary. Some subclasses may override other methods as well. Example 3-6 shows RemoveHTMLReader, which is a custom subclass of FilterReader that reads HTML text from a stream and filters out all of the HTML tags from the text it returns.

In the example, we implement the HTML tag filtration in the three-argument version of read( ), and then implement the no-argument version in terms of that more complicated version. The example includes an inner Test class with a main( ) method that shows how you might use the RemoveHTMLReader class.

Note that we could also define a RemoveHTMLWriter class by performing the same filtration in a FilterWriter subclass. Or, to filter a byte stream instead of a character stream, we could subclass FilterInputStream and FilterOutputStream. RemoveHTMLReader is only one example of a filter stream. Other possibilities include streams that count the number of characters or bytes processed, convert characters to uppercase, extract URLs, perform search-and-replace operations, convert Unix-style LF line terminators to Windows-style CRLF line terminators, and so on.

Example 3-6. RemoveHTMLReader.java

package je3.io;
import java.io.*;
/**
* A simple FilterReader that strips HTML tags (or anything between
* pairs of angle brackets) out of a stream of characters.
**/
public class RemoveHTMLReader extends FilterReader {
/** A trivial constructor.  Just initialize our superclass */
public RemoveHTMLReader(Reader in) { super(in); }
boolean intag = false;// Used to remember whether we are "inside" a tag
/** 
* This is the implementation of the no-op read( ) method of FilterReader.
* It calls in.read( ) to get a buffer full of characters, then strips
* out the HTML tags.  (in is a protected field of the superclass).
**/
public int read(char[  ] buf, int from, int len) throws IOException {
int numchars = 0;        // how many characters have been read
// Loop, because we might read a bunch of characters, then strip them
// all out, leaving us with zero characters to return.
while (numchars == 0) {
numchars = in.read(buf, from, len); // Read characters
if (numchars == -1) return -1;      // Check for EOF and handle it.
// Loop through the characters we read, stripping out HTML tags.
// Characters not in tags are copied over previous tags 
int last = from;                    // Index of last non-HTML char
for(int i = from; i < from + numchars; i++) { 
if (!intag) {                      // If not in an HTML tag
if (buf[i] == '<') intag = true; // check for tag start
else buf[last++] = buf[i];       // and copy the character
}
else if (buf[i] == '>') intag = false;  // check for end of tag
}
numchars = last - from; // Figure out how many characters remain
}                           // And if it is more than zero characters
return numchars;            // Then return that number.
} 
/** 
* This is another no-op read( ) method we have to implement.  We 
* implement it in terms of the method above.  Our superclass implements
* the remaining read( ) methods in terms of these two.
**/
public int read( ) throws IOException { 
char[  ] buf = new char[1];
int result = read(buf, 0, 1);
if (result == -1) return -1;
else return (int)buf[0];
}
/** This class defines a main( ) method to test the RemoveHTMLReader */
public static class Test {
/** The test program: read a text file, strip HTML, print to console */
public static void main(String[  ] args) {
try {
if (args.length != 1) 
throw new IllegalArgumentException("Wrong number of args");
// Create a stream to read from the file and strip tags from it
BufferedReader in = new BufferedReader(
new RemoveHTMLReader(new FileReader(args[0])));
// Read line by line, printing lines to the console
String line;
while((line = in.readLine( )) != null)
System.out.println(line);
in.close( );  // Close the stream.
}
catch(Exception e) {
System.err.println(e);
System.err.println("Usage: java RemoveHTMLReader$Test" +
" <filename>");
}
}
}
}

Java Examples In A Nutshell (3rd Edition) [Electronic resources]

O&#039;Reilly Media, Inc

3.7 Filtering Character Streams

Example 3-6. RemoveHTMLReader.java

O'Reilly Media, Inc