Apache Jakarta and Beyond: A Java Programmeramp;#039;s Introduction [Electronic resources] نسخه متنی

15.2. Excel

The POI APIs for dealing with Excel files are collectively called HSSF.^[2] APIs are provided to manipulate low-level Excel data structures, and much more convenient higher-level APIs for reading, writing, and modifying Excel files are also provided. All of this functionality rests on top of POIFS, and a tiny bit of knowledge of POIFS is necessary to begin an HSSF program.

^[2] HSSF originally stood for "Horrible Spreadsheet Format," but like the original acronym for POI, this has been downplayed because POI has put on a more professional face.

The high-level APIs deal with conceptual entities that will be familiar to anyone who has used Excel. A workbook consists of many sheets, a sheet has many rows, and a row has many cells. Methods are provided to iterate through these collections, as well as address specific elements by number. This is illustrated in Listing 15.3, which displays all the data in a spreadsheet.

Listing 15.3. Using the high-level APIs


package com.awl.toolbook.chapter15;
import java.io.FileInputStream;
import java.util.Iterator;
import org.apache.poi.poifs.filesystem.*;
import org.apache.poi.hssf.usermodel.*;
public class ReadSpreadsheet {
public static void main(String args[])
throws Exception
{
FileInputStream in = new FileInputStream(args[0]);
POIFSFileSystem fs = new POIFSFileSystem(in);
HSSFWorkbook book  = new HSSFWorkbook(fs);
int numSheets = book.getNumberOfSheets();
for(int i=0;i<numSheets;i++) {
HSSFSheet sheet = book.getSheetAt(i);
System.out.println("Sheet: " + i);
Iterator rows = sheet.rowIterator();
while(rows.hasNext()) {
HSSFRow row = (HSSFRow) rows.next();
System.out.println(
"  Row #" + row.getRowNum());
Iterator cells = row.cellIterator();
while(cells.hasNext()) {
HSSFCell cell =
(HSSFCell) cells.next();
switch(cell.getCellType()) {
case HSSFCell.CELL_TYPE_BLANK:
System.out.println("     Blank");
break;
case HSSFCell.CELL_TYPE_FORMULA:
System.out.println(
"     Formula: " +
cell.toString());
break;
case HSSFCell.CELL_TYPE_NUMERIC:
System.out.println(
"     Number: " +
cell.getNumericCellValue());
break;
case HSSFCell.CELL_TYPE_STRING:
System.out.println(
"    String: " +
cell.getStringCellValue());
break;
default:
System.out.println(
"Unknown or unhandled type");
break;
}
}
}
}
}
}

At the lowest level, most of the classes comprising HSSF exactly parallel data structures specified in BIFF8, the underlying format of Excel data as specified in the

Microsoft Excel 97 Developer's Kit , published by Microsoft Press (currently out of print). These data structures are called

records . They are defined in the org.apache.poi.hssf.record package and subpackages. The base class for all of these is called Record, and there are 125 different subclasses of Record listed in the HSSF Javadocs.^[3]

^[3] Only 98 of these classes are included in the most recent jar files. The remaining classes are probably not yet implemented.

There is certainly not space in this book to document all of these subclasses, and in any case, many will be encountered only infrequently. Here are a few of the most common records.

BOFRecord represents the beginning of any of several types of object within a spreadsheet. This has a type field indicating the type of object that may be TYPE_WORKBOOK, TYPE_VB_MODULE, TYPE_WORKSHEET, TYPE_CHART, TYPE_EXCEL_4_MACRO, or TYPE_WORKSPACE_FILE. The most important of these are TYPE_WORKBOOK, which contains the entire data portion of a spreadsheet, and TYPE_WORKSHEET, which indicates the start of a particular sheet.
BoundSheetRecord defines a sheet within a workbook. This contains the name of the sheet and internal information about where in the document the data for the sheet lives.
EOFRecord indicates the end of an object whose start was indicates with a BOFRecord. Objects are nested; a EOFRecord will always match the most recently encountered BOFRecord.
RowRecord contains information about a particular row, including formatting data.
SSTRecord contains a table of strings and other constants used within a sheet.
BlankRecord contains information about a blank cell. This includes the row and column number of the cell as well as formatting information such as font, font size, and vertical and horizontal alignments. This same information is available for any Record that contains cell data, including the next four items.
LabelSSTRecord contains an index into the table provided by the SSTRecord.
LabelRecord is a read-only string used as a label. LabelSSTRecord is more common and more useful.
NumberRecord contains a numeric value in a cell.
FormulaRecord contains a formula within a cell. Formulae are stored in an internal compiled format, but the toString() method can usually convert this back into an expression that would be entered by a user.

Although it is possible to read an Excel spreadsheet directly by using these record objects, it is much more convenient to use the provided event-driven API. This is done by providing a class that implements the HSSFListener interface. This interface requires a processRecord() method that takes a Record as an argument. One or more of these listeners are added to a HSSFRequest, specifying the events in which they are interested. An instance of HSSFEventFactory is responsible for parsing the underlying data and invoking the appropriate listeners. The event API is demonstrated in Listing 15.4 .

Listing 15.4. A program that dumps Excel records


package com.awl.toolbook.chapter15;
import java.io.*;
import org.apache.poi.poifs.filesystem.*;
import org.apache.poi.hssf.record.*;
import org.apache.poi.hssf.record.aggregates.*;
import org.apache.poi.hssf.eventusermodel.*;
public class SimpleReader implements HSSFListener {
public void processRecord(Record record) {
System.out.println("Found record " +
record.getClass().getName());
System.out.println(record.toString());
}
public static void main(String[] args)
throws Exception
{
FileInputStream in = new FileInputStream(args[0]);
POIFSFileSystem fs = new POIFSFileSystem(in);
InputStream workbook =
fs.createDocumentInputStream("Workbook");
HSSFEventFactory factory = new HSSFEventFactory();
HSSFRequest req = new HSSFRequest();
// Listen for all events with this class
req.addListenerForAllRecords(new SimpleReader());
factory.processEvents(req, workbook);
in.close();
workbook.close();
}
}

The main() method uses createDocumentInputStream(), discussed in the section on POIFS, to obtain the Document containing the spreadsheet data. The rest of the method directly follows the preceding discussion.

Figure 15.1 shows a simple spreadsheet tracking developing for two teams. The total and average columns are all given by formulas. When SimpleReader is run on this spreadsheet, it produces a surprising 210 records, and the information dumped in the toString() representation of these records comprises 3074 lines.

Figure 15.1. A sample spreadsheet.

[View full size image]

The next issue that must be addressed is how to handle different record types in the processRecord() method. The Record base class defines a sid, a numeric ID that uniquely identifies each subclass. Each subclass also has a corresponding static sid. Therefore, if a method accepts an argument of type Record, it can determine whether the argument is really of a particular subtypesay, BOFRecordby checking


if(record.getSid() == BOFRecord.sid) {
System.out.println("It is a BOFRecord");
}

Readers who have programmed in C may recognize this pattern as C-style structs and unions. Indeed, the Microsoft specifications define these structures in C terms, and the POI team has translated these into Java in the cleanest and most straightforward way possible. This means that typical code to handle Record objects also resembles C code:


switch (record.getSid()) {
case RowRecord.sid:
RowRecord rowrec = (RowRecord) record;
... handle the row record ...
break;
case NumberRecord.sid:
NumberRecord numrec = (NumberRecord) record;
... handle the number record ...
break;
... other cases ...
}

Clearly, this is not very Java-like. An alternative would be to avoid the sid and work with classes directly, as in


try  {
RowRecord rowrec = (RowRecord) record;
... handle the row record ...
return;
} catch (ClassCastException e) {}
try {
NumberRecord numrec = (NumberRecord) record;
... handle the number record ...
return;
} catch (ClassCastException e) {}

While this example looks more like standard Java, it is grossly inefficient because code may have to go through 124 attempted casts before getting to the right one.

In addition to these deficiencies, both approaches have the disadvantage of creating very long methods in which it may be difficult to find and isolate the clause that deals with a particular subclass of Record.

Provided with this book is a class called HSSFDispatcher^[4] that attempts to remedy some of these problems. This class implements the HSSFListener interface and provides an implementation of processRecord() that uses a switch statement with a case for every sid in the 98 available records. Each clause of these cases casts the Record to the appropriate subtype and then passes it to a method whose name is the name of the subclass, prepended by the word "handle." All of these methods simply pass the Record to a method called handleDefaultRecord(), which does nothing. If the record passed to processRecord() does not match any known type, which may happen if later versions of POI added new record types, then the record is passed to handleUnknownRecord(), which also does nothing. A small portion of HSSFDispatcher is shown below:

^[4] The HSSFDispatcher class is automatically generated by a bash script called makeDispatcher.sh, which is also included on the CD-ROM.


package com.awl.toolbook.chapter16;
import org.apache.poi.hssf.record.*;
import org.apache.poi.hssf.record.aggregates.*;
import org.apache.poi.hssf.eventusermodel.HSSFListener;
public class HSSFDispatcher
implements HSSFListener
{
public void processRecord(Record record) {
switch (record.getSid()) {
case AreaFormatRecord.sid:
handleAreaFormatRecord(
(AreaFormatRecord) record);
break;
case AreaRecord.sid:
handleAreaRecord((AreaRecord) record);
break;
... every other SID ...
default:
handleUnknownRecord(record);
break;
}
}
public void handleAreaFormatRecord(
AreaFormatRecord record)
{
handleDefaultRecord(record);
}
public void handleAreaRecord(AreaRecord record) {
handleDefaultRecord(record);
}
... Every other method ...
public void handleDefaultRecord(Record record) {}
public void handleUnknownRecord(Record record) {}
}

Programs wishing to use HSSF can extend HSSFDispatcher and override those methods that handle records of interest. handleDefaultRecord() can be overridden to catch any remaining records. This approach in used in Listing 15.5.

Listing 15.5. Using the dispatcher


package com.awl.toolbook.chapter15;
import java.io.*;
import org.apache.poi.poifs.filesystem.*;
import org.apache.poi.hssf.record.*;
import org.apache.poi.hssf.record.aggregates.*;
import org.apache.poi.hssf.eventusermodel.*;
public class ExcelReader
extends HSSFDispatcher
{
private SSTRecord sstrec = null;
public void handleBOFRecord(BOFRecord bof) {
if (bof.getType() == bof.TYPE_WORKBOOK) {
System.out.println("Workbook found");
} else if (bof.getType() == bof.TYPE_WORKSHEET) {
System.out.println("Sheet reference found");
}
}
public void handleSSTRecord(SSTRecord record) {
sstrec = record;
}
public void handleMergeCellsRecord(
MergeCellsRecord mcr)
{
for(int i=0;i<mcr.getNumAreas();i++) {
MergeCellsRecord.MergedRegion region =
mcr.getAreaAt(i);
System.out.println("Merged region, from (" +
region.row_from + ", " +
region.col_from + ") to (" +
region.row_to + ", " +
region.col_to + ")");
}
}
public void handleBoundSheetRecord(
BoundSheetRecord bsr)
{
System.out.println("Found sheet named: " +
bsr.getSheetname());
}
public void handleRowRecord(RowRecord rowrec) {
System.out.println("Row found, first column at "
+ rowrec.getFirstCol() +
" last column at " +
rowrec.getLastCol());
}
public void handleNumberRecord(NumberRecord numrec) {
System.out.println("(" + numrec.getRow() + "," +
numrec.getColumn() + "): " +
"number=" +
numrec.getValue());
}
public void handleFormulaRecord(FormulaRecord frec) {
System.out.println("(" + frec.getRow() + "," +
frec.getColumn() + "): " +
"forumla=" +
frec.toString());
}
public void handleLabelSSTRecord(LabelSSTRecord lrec) {
System.out.println("(" + lrec.getRow() + ", " +
lrec.getColumn() + "): " +
"string=" +
sstrec.getString(
lrec.getSSTIndex()));
}
public void handleBlankRecord(BlankRecord blank) {
System.out.println("(" + blank.getRow() + "," +
blank.getColumn() + "): blank");
}
public void handleDefaultRecord(Record rec) {
if(rec instanceof CellValueRecordInterface) {
System.out.println("Unhandled record type: " +
rec.getClass().getName());
}
}
public void handleUnknownRecord(Record rec) {
System.out.println("Unknown record type: " +
rec.getClass().getName());
}
public static void main(String[] args)
throws Exception
{
FileInputStream in = new FileInputStream(args[0]);
POIFSFileSystem fs = new POIFSFileSystem(in);
InputStream workbook =
fs.createDocumentInputStream("Workbook");
HSSFEventFactory factory = new HSSFEventFactory();
HSSFRequest req = new HSSFRequest();
// Listen for all events with this class
req.addListenerForAllRecords(new ExcelReader());
factory.processEvents(req, workbook);
in.close();
workbook.close();
}
}

Here is a partial list of the output from Listing 15.5:


Workbook found
Found sheet named: Sheet1
Found sheet named: Sheet2
Found sheet named: Sheet3
Sheet reference found
Row found, first column at 0 last column at 7
Row found, first column at 0 last column at 7
...
(0,0): string=Code produced by team A
(0,1): blank
(0,2): blank
(0,4): string=Code produced by team B
(0,5): blank
(0,6): blank
(1,0): string=Name
(1,1): string=Classes
(1,2): string=Lines
...
(4,0): string=Octopus, Ophelia
(4,1): number=8.0
(4,2): number=256.0
...
(6,0): string=Total
(6,3): blank
(6,4): blank
(7,0): string=Average
(7,3): blank
(7,4): blank
Merged region, from (0,0) to (0,2)
Merged region, from (0,4) to (0,6)

Apache Jakarta and Beyond: A Java Programmeramp;#039;s Introduction [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی