The load utility is another tool you can use to insert data into a table. Note that you cannot run the load tool against a view; the target must be a table that already exists. The major difference between a load and an import is that a load is much faster. Unlike the import tool, data is not written to the database using normal insert operations. Instead, the load utility reads the input data, formats data pages, and writes directly to the database. Database changes are not logged and constraint validations are not performed during a load operation.
Basically, a complete load process consists of four phases.
During the
load phase , the load utility scans the input file for any invalid data rows that do not comply with the table definition; for example, if a table column is defined as INTEGER but the input data is stored as "abcd". Invalid data will not be loaded into the table. The rejected rows and warnings will be written to a dump file specified by the
dumpfile modifier. Valid data is then written into the table. At the same time, table statistics (if the
statistics option was specified) and index keys are also collected. If the
savecount option is specified in the
load command, points of consistency are recorded in the message file. Consistency points are established by the load utility. They are very useful when it comes to restarting the load operation. You can restart the load from the last successful consistency point.
During the
build phase , indexes are produced based on the index keys collected during the load phase. The index keys are sorted during the load phase, and index statistics are collected (if the
statistic option was specified).
In the
load phase , the utility only rejects rows that do not comply with the column definitions. Rows that violated any unique constraint will be deleted in the delete phase. Note that only unique constraint violated rows are deleted. Other constraints are not checked during this phase or during any load phase. You have to manually check it after the load operation is complete. Refer to Section 12.5.7, Validating Data Against Constraints, for more information.
During the
index copy phase , index data is copied from a system temporary table space to the original table space. This will only occur if a system temporary table space was specified for index creation during a load operation with the
read access option specified (see section 12.5.2.5, Locking Considerations During a Load).
The load utility is so powerful that its command can be executed with many different options. Figure 12.16 presents a simplified version of the
load command syntax diagram.
.-,--------------. V | >>-LOAD--+--------+--FROM----+-filename---+-+--OF--filetype-----> '-CLIENT-' +-pipename---+ +-device-----+ '-cursorname-' >--+-------------------------+----------------------------------> | .-,--------. | | V | | '-LOBS FROM----lob-path-+-' >--+-------------------------------+----------------------------> | .--------------. | | V | | '-MODIFIED BY----filetype-mod-+-' >--+--------------+--+-------------+--+-----------------+-------> '-SAVECOUNT--n-' '-ROWCOUNT--n-' '-WARNINGCOUNT--n-' >--+------------------------+-----------------------------------> '-MESSAGES--message-file-' >--+-------------------------------+--+-INSERT----+-------------> '-TEMPFILES PATH--temp-pathname-' +-REPLACE---+ +-RESTART---+ '-TERMINATE-' >--INTO--table-name--+-------------------------+----------------> | .-,-------------. | | V | | '-(----insert-column-+--)-' >--+---------------------------+--------------------------------> '-FOR EXCEPTION--table-name-' >--+-----------------------------+------------------------------> '-STATISTICS--+-USE PROFILE-+-' '-NO----------' >--+-----------------------------------------------------------------+--> | .-NO----------------------------------------------------. | +-COPY--+-YES--+-USE TSM--+--------------------------+--------+-+-+ '-NONRECOVERABLE--------------------------------------------------' >--+-------------------+--+--------------------------+----------> '-WITHOUT PROMPTING-' '-DATA BUFFER--buffer-size-' >--+--------------------------+--+--------------------+---------> '-SORT BUFFER--buffer-size-' '-CPU_PARALLELISM--n-' >--+---------------------+--------------------------------------> '-DISK_PARALLELISM--n-' >--+--------------------------------+---------------------------> '-INDEXING MODE--+-AUTOSELECT--+-' +-REBUILD-----+ +-INCREMENTAL-+ '-DEFERRED----' .-ALLOW NO ACCESS-----------------------------. >--+---------------------------------------------+--------------> '-ALLOW READ ACCESS--+----------------------+-' '-USE--tablespace-name-' >--+--------------------------------------+---------------------> '-CHECK PENDING CASCADE--+-IMMEDIATE-+-' '-DEFERRED--' >--+-----------------+------------------------------------------> '-LOCK WITH FORCE-'
As you can see, there are many options available to customize your load operation. The following examples illustrate how to use some of them.
In Figure 12.17, data in a DEL input file is loaded into a list of columns in table
stock . The
messages option is used to record warnings and errors encountered during the load operation. This particular load will stop when the threshold of warnings (in this case, 10) is encountered. You can check the output file for warnings and errors.
The
savecount option establishes consistency points after every 1,000 rows are loaded. Because a message is issued at each consistency point, ensure that the
savecount value is sufficiently high to minimize performance impact.
load from stock.del of del
savecount 1000
warningcount 10
messages stock.out
insert into stock(itemid, itemdesc, cost, inventory)
Consistency points are established during the load phase. You can use these to restart a failed or terminated load operation. By specifying the same
load command but replacing
insert with the
restart option, the load operation will automatically continue from the last consistency point.
To terminate a load, issue the same
load command but use the
terminate option in place of
insert . For example:
load from stock.del of del
savecount 1000
warningcount 10
messages stock.out
terminate
The load utility supports four file formats: IXF, DEL, ASC, and CURSOR (described in Section 12.2, Data Movement File Formats). When using the CURSOR file type as demonstrated in Figure 12.18, the cursor must be already declared but does not need to be opened. The entire result of the query associated with the specified cursor will be processed by the load utility. You must also ensure that the column types of the SQL query are compatible with the corresponding column types in the target table.
declare cur1 cursor as select * from oldstock;
load from cur1 of cursor
messages curstock.out
insert into stock
As mentioned earlier, the load process goes through four phases. During the load phase, data that does not comply with the column definition will not be loaded. Rejected records can be saved in a dump file by using the
modified by
dumpfile modifier. If
dumpfile is not specified, rejected records will not be saved. Since the load utility will not stop unless it reaches the warning threshold if one is specified, it is not easy to identify the rejected records. Hence, it is always a good practice to use the modifier and validate the message file after a load is completed. Figure 12.19 shows how to use
modified by
dumpfile .
load from stock.ixf of ixf
modified by dumpfile=stockdump.dmp
messages stock.out
replace into stock
for exception stockexp
Assume that the input file
stock.ixf contains the data in Table 12.3.
itemid | itemdesc | inventory |
---|---|---|
10 | ~~~ | 1 |
20 | ~~~ | |
30 | ~~~ | 3 |
30 | ~~~ | 4 |
40 | ~~~ | X |
50 | ~~~ | 6 |
50 | ~~~ | 7 |
80 | ~~~ | 8 |
The target table
stock is defined with three columns using this
CREATE TABLE statement:
CREATE TABLE stock
( itemid INTEGER NOT NULL
, itemdesc VARCHAR(100)
, inventory INTEGER NOT NULL
, PRIMARY KEY (itemid) )
exception table using the
for exception option. If an exception table is not specified, the rows will be discarded.
You need to create an exception table manually before you can use it. The table should have the same number of columns, column types, and nullability attributes as the target table to be loaded. You can create such a table with this command:
CREATE TABLE stockexp LIKE stock
To log when and why rows are rejected, you can add two other optional columns to the end of the exception table. The first column is defined as a TIMESTAMP data type to record when the record was deleted. The second column is defined as CLOB (32K) or larger and tracks the constraint names that the data violates. To add columns to the table, use the
ALTER TABLE statement:
ALTER TABLE stockexp
ADD COLUMN load_ts TIMESTAMP
ADD COLUMN load_msg CLOB(32k)
Like the
dumpfile modifier, it is a good practice to also use the exception table, especially if unique violations are possible. The exception table illustrated in Table 12.5 contains rows that violated the unique constraints.
itemid | itemdesc | inventory |
---|---|---|
30 | ~~~ | 4 |
50 | ~~~ | 7 |
Figure 12.20 shows the big picture of the concepts of
dumpfile and the exception table.
1. | Create the target table stock . |
2. | Issue the load command with modified by dumpfile , messages , and for exception options. |
3. | Rows that do not comply with the table definition (NOT NULL and numeric column) are recorded in the stockdump.dmp file. |
4. | Rows that violated the unique constraint are deleted from the stock table and inserted into the exception table. |
5. | Four rows are successfully loaded into the stock table. |
In all the examples you have seen so far, the load commands are executed from the database server and the input files are located on the database server. You may sometimes want to invoke a load operation from a remote client as well as using a file that resides at the client. To do so, specify the
client keyword in the command as demonstrated in Figure 12.21.
load client from stock.ixf of ixf
modified by dumpfile=stockdump.dmp
rowcount 5000
messages stock.out
tempfiles path c:\loadtemp
replace into stock
for exception stockexcept
lock with force
You cannot load a CURSOR file type from a client. The
dumpfile and
lobsinfile modifiers (discussed in the following sections) refer to files on the server even when the command includes the
client keyword.
NOTE
Use the
load client command when the input file resides on the client from which you are issuing the command. Use the
dumpfile, tempfile , and
lobsinfile modifiers for files located on the DB2 server.
The
rowcount option works exactly the same as the one supported by the import utility. You can control the number of rows to be loaded with this option.
During the load process, the utility uses temporary files. By default, it allocates temporary files from the directory where the
load command was issued. To explicitly specify a path for this purpose, use the
tempfiles option as shown in Figure 12.21.Notice that the example also uses the
replace mode, which replaces the old data in the target table with the new data.
The utility acquires various locks during the load process. If you choose to give the load operation a higher priority then other concurrent applications, you can specify the
lock with force option (in Figure 12.21) to immediately terminate other applications that are holding conflicting locks so that the load utility does not have to wait for locks.
By default, no other application can access the target table that is being loaded. The utility locks the target table for exclusive access until the load completes. You can set this default behavior with the
allow no access option. This is the only valid option for
load replace .
You can increase concurrency by locking the target table in share mode and allowing read access. In Figure 12.22, the
allow read access option is enabled, which lets readers access data that existed before the load. New data will not be available until the load has completed.
load from stock.ixf of ixf
modified by dumpfile=stockdump.dmp
messages stock.out
replace into stock
for exception stockexcept
allow read access
indexing mode incremental
The last option in Figure 12.22,
indexing mode , indicates whether the load utility is to rebuild indexes or to extend them incrementally. This is done in the build phase. You can use the options in Table 12.6.
INDEXING MODE option | Description |
---|---|
REBUILD | Forces all indexes to be rebuilt. |
INCREMENTAL | Extends indexes with new data. |
AUTOSELECT (default) | The load utility will automatically choose between REBUILD or INCREMENTAL mode. |
DEFERRED | Indexes will not be rebuilt but will be marked as needing a refresh. An index will be rebuilt when it is first accessed or when the database is restarted. |
The file type modifiers supported in the load utility are as comprehensive as those supported in the export and import utilities. The following section discusses a few of the modifiers. Refer to the
DB2 Data Movement Utilities Guide and Reference for a complete list of load utility modifiers.
When you insert data into a table with the insert, import, or load operations, DB2 tries to fit as much of the data into the data and index pages as possible. Consider pages tightly packed as shown in Figure 12.23.
When a certain record is updated with data larger than the original size, new data might not be able to fit into the original data page. DB2 will then search for the next free page to store the updated record. The updated record is referenced from the original page by a
pointer . When a request comes in to retrieve the record, DB2 first locates the original data page and then searches for the new data page as referenced by the pointer. This is called
page overflow (see Figure 12.24).
The higher the number of page overflows, the more time DB2 will spend finding the data or index page. Hence, you want to avoid page overflows as much as possible to improve performance.
To minimize page overflows, you can customize the table definition so that certain free space is reserved so that the pages are not tightly packed. The
CREATE TABLE, ALTER TABLE , and
CREATE INDEX statements have options for leaving free space in data and/or index pages. The
load command also has options to override the default set for the target table. You can specify this using the file type modifiers: indexfreespace, pagefreespace, and totalfreespace.
Modifiers pagefreespace=x and indexfreespace=x can be used to specify the percentage of each data and/or index page that is to be left as free space. For example, Figure 12.25 illustrates leaving 20 percent of free space on each data and index page.
The modifier totalfreespace=x specifies the percentage of the total pages in the table that is to be appended to the end of the table as free space. For example, if x = 20, and the table has 100 data pages after the data has been loaded, 20 additional empty pages will be appended. The total number of data pages for the table will be 120 (see Figure 12.26).
The load utility uses the same option and modifier as the import utility to specify the path where LOBs are stored. For example, the following command lists the directories where LOBs are stored with the
load from stock.ixf of ixf
lobs from c:\lobs1, c:\lobs2, c:\lobs3
modified by dumpfile=stockdump.dmp lobsinfile
messages stock.out
replace into stock
for exception stockexcept
During the load phase of the process, the load utility also collects table statistics if you specify
statistics . You can either collect statistics using the statistic profile with
the statistics use profile option, or specify not to collect statistics with the
statistics no option. A
statistic profile is a set of options that specify which statistics are to be collected, such as table, index, or distribution statistics.
If you choose not to collect statistics during the load, you should always update the statistics at the earliest convenient time. When large amounts of new data are inserted into a table, you should update the statistics to reflect the changes so that the optimizer can determine the most optimal access plan.
Recall that changes made to the target tables during the load are not logged. This is one of the characteristics of the load utility that improves performance. However, it also takes away the ability to perform roll forward recovery for the load operation. DB2 puts the table space where the target table resides in backup pending state when the load operation begins. After the load completes, you must back up the table space or database. This ensures that the table space can be restored to the point where logging is resumed if you ever need to restore the table space restore. This is the behavior of the load option
copy no . You can also specify
copy yes if archival logging is enabled. With
copy yes , a copy of the loaded data will be saved and the table space will not be in backup pending state upon load completion. However, this negatively impacts the performance of the load. Table space status related to load operation will be discussed later in this chapter.
When you cannot afford to have a window to perform a table space backup after the load is complete but you also need the load to complete as fast as possible, neither
copy yes nor
copy no is a good solution. You may want to consider using the option
nonrecoverable if the target table can be recreated and data can be reloaded.
The
nonrecoverable option specifies that the target table is marked as nonrecoverable until the associated table space is backed up. In case of failure, such as disk or database failure, the table space needs to be restored and rolled back. The roll forward utility marks the data being loaded as
invalid and skips the subsequent transactions for the target table. After the roll forward operation is completed, the target table is not accessible and it can only be dropped. Note that other tables in the same table space are not affected by this option.
The load utility checks for invalid data and unique constraints during the load process. However, other constraints such as referential integrity and check constraints are not validated. DB2 therefore puts target tables defined with these constraints in check pending state. This forces you to manually validate the data before the tables are available for further processing.
The
set integrity command gives you the ability to do just that. The command can be as simple as the following, which immediately validates data against the constraints for table
stock .
set integrity for stock immediate checked
There are many other options; refer to the
DB2 UDB Command Reference for the complete syntax of the command.
You can further speed up the load performance by taking advantage of the extra hardware resources you might have on the machine. Table 12.7 lists options and modifiers you can use.
Performance-Related Modifiers | Description |
---|---|
DATA BUFFER | Specifies the number of 4KB pages to use as buffered space for transferring data within the load utility. |
SORT BUFFER | Specifies the amount of memory used to sort index keys during the load operation. |
CPU_PARALLELISM | Specifies the number of processes that the load utility will spawn for parsing, converting, and formatting records during the load operation. |
DISK_PARALLELISM | Specifies the number of processes that the load utility will spawn for writing data to the table space containers. |
FASTPARSE | Reduces syntax checking on input data. Note that this modifier may not detect invalid data. |
ANYORDER | Specifies that preserving source data order is not required. |
To perform a load, you must have SYSADM, DBADM, or LOAD authority. With the LOAD authority, you also need specific privileges on the target tables depending on the mode used in the
load command. For example, you need INSERT privileges on the table when the load utility is invoked in INSERT mode. If you use REPLACE mode, you need INSERT and DELETE privileges on the target table.
Note that you also need appropriate access to the exception table if one is specified. In addition, when using the
copy yes option, you need SYSADM, SYSCTRL, or SYSMAINT authority because a backup is performed during the load operation.
The Control Center provides a graphical tool to invoke a load operation. Right-click on the target table and select
Load to start the Load Wizard (Figure 12.27). The Load Wizard walks you through the process of loading a table.
During the phases of a load, the target table and its associated table spaces are in different states. By checking the state of the table and table space, you can tell which phase the load operation is currently in. Before introducing the tools to obtain this information, let's first discuss the different table and table space states.
Table 12.8 lists the states in which Tables can be placed by the database manager. You can control some of these; others are caused by the load utility.
Table State | Description |
---|---|
Normal | The table is in normal state. |
Check pending | Table is placed in check pending because it has constraints that have not yet been verified. When the load operation begins, it places tables with constraints (foreign key constraint and check constraint) in this state. |
Load in progress | Load is in progress on this table. |
A load operation has been activated on this table. However, it was aborted before data could be committed. Issue the load command with the terminate, restart , or replace option to bring the table out of this state. | |
Read access only | The table data is available for read access queries. Load operations using the allow read access option placed the table in this state. |
Unavailable | The table is unavailable. You can drop or restore it from a backup. Rolling forward through a non-ecoverable load operation will place a table in this state. |
Not load restartable | When information required for a load restart operation is unreliable, the table will be placed in this state. This prevents a load restart operation from taking place. For example, a table is placed in this state when a roll forward operation is started after a failed load operation that has not been successfully restarted or terminated. |
Type-1 indexes | Type-1 indexes are used in DB2 prior to Version 8. Tables currently using type-1 indexes can be converted to type-2 indexes using the REORG utility with CONVERT option. Type-2 indexes provide significant locking enhancements. They are also required to perform some online maintenance tasks, such as REORG. |
Unknown | The table state cannot be determined. |
Table 12.9 lists the states in which table spaces can be placed by the database manager.
Table State | Description |
---|---|
Normal | The table space is in normal state. |
Quiesced: SHARE | The table space has been quiesced in SHARED mode. |
Quiesced: UPDATE | The table space has been quiesced in UPDATE mode. |
Quiesced: EXCLUSIVE | The table space has been quiesced in EXCLUSIVE mode. |
Load pending | A table space is put in this state if a load operation has been active on one of its associated tables but has been aborted before data could be committed. |
Delete pending | A table space is put in this state if one of its associated tables is undergoing the delete phase of a load operation but has been aborted or failed. |
Backup pending | A table space is put in this state after a Point In Time roll forward operation, or after a load operation with the no copy option. You must back up the table space before using it. If it is not backed up, then you cannot update the table space, and only read-only operations are allowed. |
Roll forward in progress | A table space is put in this state when a roll forward operation on that table space is in progress. Once the roll forward operation completes successfully, the table space is no longer in roll forward-in-progress state. The table space can also be taken out of this state if the roll forward operation is cancelled. |
Roll forward pending | A table space is put in this state after it is restored or following an I/O error. After it is restored, the table space can be rolled forward to the end of the logs or to a Point In Time. Following an I/O error, the table space must be rolled forward to the end of the logs. |
Restore pending | A table space is put in this state if a roll forward operation on that table space is cancelled, or if a roll forward operation on that table space encounters an unrecoverable error, in which case the table space must be restored and rolled forward again. |
Load in progress | A table space is put in this state if it is associated with a load operation. The load in progress state is removed when the load operation is completed or aborted. |
Reorg in progress | An REORG operation is in progress on one of the tables associated to the table space. |
Backup in progress | A backup is in progress on the table space. |
Storage must be defined | For DB2 database manager internal use only. |
Restore in progress | A restore is in progress on the table space. |
Offline and not accessible | DB2 failed to access or use one or more containers associated to the table space, so the table space is placed offline. To take the table space out of this state, repair the containers. |
DB2 has two utilities that you can use to obtain the table state. Figure 12.28 presents the syntax diagram of one of them, the
load query command.
>>-LOAD QUERY--TABLE--table-name--+------------------------+----> '-TO--local-message-file-' >--+-------------+--+-----------+------------------------------>< +-NOSUMMARY---+ '-SHOWDELTA-' '-SUMMARYONLY-'
You can specify the following command to check the status of the load operation:
load query table stock to c:/stockstatus.out
The output file
stockstatus.out might look similar to Figure 12.29.
SQL3501W The table space(s) in which the table resides will not be placed in backuppending state since forward recovery is disabled for the database. SQL3109N The utility is beginning to load data from file "stock.del" SQL3500W The utility is beginning the "LOAD" phase at time "03-21-2002 11:31:16.597045". SQL3519W Begin Load Consistency Point. Input record count = "0". SQL3520W Load Consistency Point was successful. SQL3519W Begin Load Consistency Point. Input record count = "104416". SQL3520W Load Consistency Point was successful. SQL3519W Begin Load Consistency Point. Input record count = "205757". SQL3520W Load Consistency Point was successful. SQL3532I The Load utility is currently in the "LOAD" phase. Number of rows read = 205757 Number of rows skipped = 0 Number of rows loaded = 205757 Number of rows rejected = 0 Number of rows deleted = 0 Number of rows committed = 123432 Number of warnings = 0 Tablestate: Load in Progress
The
list utilities command displays the list of active utilities on the instance. Use the
show detail option to also display detailed progress information. Figure 12.30 illustrates sample output.
list utilities show detail ID = 1 Type = LOAD Database Name = SAMPLE Partition Number = 0 Description = OFFLINE LOAD Unknown file type AUTOMATIC INDEXING INSERT
COPY NO Start Time = 03/15/2004 00:41:08.767650 Progress Monitoring: Phase Number = 1 Description = SETUP Total Work = 0 bytes Completed Work = 0 bytes Start Time = 03/15/2004 00:41:08.786501 Phase Number [Current] = 2 Description = LOAD Total Work = 11447 rows Completed Work = 5481 rows Start Time = 03/15/2004 00:41:09.436920
The report generated in Figure 12.30 indicates that a load was performed on the database
sample and includes a brief description of the operation.
Progress Monitoring tells you the current phase of the load and the number of rows already loaded and to be loaded.
The table space in which the load target table resides will be placed in backup pending state if
COPY NO (the default) option is specified. The utility places the table space in this state at the beginning of the load operation. The table spaces stays in backup pending mode even when the load is complete until you perform a database or table space level backup.
Figure 12.31 shows how to retrieve the table space status.
list tablespaces show detail Tablespace ID = 2 Name = USERSPACE1 Type = System managed space Contents = Any data State = 0x0000 Detailed explanation: Backup pending Total pages = 527 Useable pages = 527 Used pages = 527 Free pages = Not applicable High water mark (pages) = Not applicable Page size (bytes) = 4096 Extent size (pages) = 32 Prefetch size (pages) = 16 Number of containers = 1