Indexes
Index isanother database physical structure that occupies disk space in a way similar
to that of a table. The main difference is that indexes are hidden from users
and are not mentioned in any DML statements, even though they are often used
behind the scene.Assume our
PHONE table created previously in this
chapter has been populated with records. Assume you issue the following query:
SELECT * FROM phone WHERE
PHONE_CUSTID_FN = 152
The RDBMS first checks if the index exists
on the condition column (PHONE_CUSTID_FN). If the
answer is yes, the index is used to determine the physical location of the
corresponding rows (i.e., the two rows with
PHONE_CUSTID_FN
=
152). If no index is found on the column,
the whole table is scanned to find rows with appropriate values.A database index is similar to an index at
the end of a book — it stores pointers to the physical row locations on the
disk in the same way a book's index points to the page numbers for the
appropriate topics. From another viewpoint, it is similar to a database table
with two or more columns: one for the row's physical address, and the rest for
the indexed table columns. In other words, index tells the RDBMS where to look
for a specific table row (or a group of rows) on the disk as illustrated on
Figure
4-4.

Figure 4-4: Index
use
As you can see, the table column and the
index have the same set of values; the main difference is that in the index,
these values are sorted, so it takes much less time to find what you
need.In most databases indexes are implemented
as B-Tree indexes, that is, they use the so called B-Tree algorithm that
minimizes the number of times the hard disk must be accessed to locate a
desired record, thereby speeding up the process. Because a disk drive has
mechanical parts, which read and write data far more slowly than purely
electronic media, it takes thousands of times longer to access a data element
from a hard disk as compared with accessing it from RAM.B-Trees save time by using nodes with many
branches (called children). The simplest version of a B-Tree is called a binary
tree because each node has only two children.
Figure
4-5 illustrates a search for the value 100 using a binary tree. The
algorithm is very simple. Starting at the top, if the top node value is less
than what you are looking for, move to the left; if it's greater than 100, go
to the right, until the value is found.

Figure 4-5: B-Tree
example
Indexes can be created to be either unique
or nonunique. Unique indexes are implicitly created on columns for which a
PRIMARY
KEY or a
UNIQUE constraint is specified. Duplicate
values are not permitted. Nonunique indexes can be created on any column or
combination of columns without any regard to duplicates.Indexes can be created on one column or on
multiple columns. The latter can be useful if the columns are often used
together in
WHERE clauses. For example, if some
frequently used query looks for a certain customer's orders created on a
certain date, you may create a nonunique index on the
ORDHDR_CUSTID_FN and
ORDHDR_CREATEDATE_D columns of the
ORDER_HEADER table.SQL99 does not specify any standards for
indexes (or even require their existence at all), but practically all database
vendors provide mechanisms to create indexes, because without them any
production database would be unbearably slow.There is no universal rule on when to
create indexes, but some general recommendations can be given.
It usually does not make much sense to create indexes on small
tables — they can degrade performance rather than improve it. If the table is
only, say, 50 rows long, it might be faster to scan it than to use the B-Tree
algorithm.
On large tables, indexes should be created only if the queries
that involve indexed column(s) retrieve a small percentage of rows (usually
under 15 percent).
Indexes are usually helpful on columns used in table joins.
(Primary keys and unique columns are indexed by default; it is often not such a
bad idea to index foreign key columns also.)
Indexes slow down DML operations that involve indexed columns —
for example, if you update a value on such column, the index column has also to
be updated; and if you insert a row, the corresponding index(es) may have to be
re-sorted. So if a table is likely to be subjected to frequent updates,
inserts, and/or deletes, it is recommended to have fewer indexes.
CREATE INDEX statement
The
CREATE
INDEX statement differs slightly for
different implementations. The somewhat simplified syntax is below.
Oracle
9i
CREATE [UNIQUE | BITMAP]
INDEX [<schema>.]<index_name> ON
[<schema>.]<table_name> ({<column> |
<column_expression> }[ASC | DESC],...)
[<physical_parameters>];
This statement creates a unique index
on the
IDX_CUST_NAME on the
CUST_NAME_S column of the
CUSTOMER table with column values
stored in descending order:
CREATE UNIQUE INDEX
idx_cust_name ON CUSTOMER(cust_name_s DESC)
Function-based indexes
Oracle 9i also
lets you create so-called function-based indexes where instead of a column you
can specify a column expression (a deterministic function). For example, you
know that customers often query the
CUSTOMER table using the
LOWER function in the
WHERE clause:
... WHERE
LOWER(cust_name_s) = 'boswell designs corp.' ...
In this situation, a function-based
index might make sense:
CREATE UNIQUE INDEX
idx_cust_name ON CUSTOMER(LOWER(cust_name_s))
Bitmap indexes
The
BITMAP keyword indicates that an
index is to be created with a bitmap for each distinct key rather than indexing
each row. Oracle recommends creating bitmap indexes on columns with low
cardinality, that is, columns such as hypothetical
GENDER and
MARITAL_STATUS columns that would
likely have few distinct values.In the ACME database, order status
can be a goof candidate for a bitmap index because there are only four possible
statuses for an order (COMPLETE,
INVOICED,
SHIPPED, and
CANCELLED). The example below creates
bitmap index
IDX_ORDHDR_STATUS on the
ORDHDR_STATUSID_FN column of the
ORDER_HEADER table:
CREATE BITMAP INDEX
idx_ordhdr_status ON order_header (ordhdr_statusid_fn)
Note | Function-based and bitmap indexes are available in Oracle Enterprise Edition only, so you will get an error if you try to execute the foregoing statement on the Personal or Standard versions. |
Physical storage clause
As we mentioned before, you can
specify separate tablespaces for table data and table indexes (and it does make
sense from the database performance point of view). The syntax is the same as
for
CREATE
TABLE (the example below assumes
tablespace
INDEX01 exists in your database):
CREATE UNIQUE INDEX
idx_cust_name ON CUSTOMER(cust_name_s) TABLESPACE INDEX01
DB2 UDB 8.1
Here is the simplified DB2 syntax to
create an index:
CREATE [UNIQUE] INDEX
[<schema>.]<index_name> ON [<schema>.]<table_name>
(<column_name> [ASC | DESC],...)
This statement creates unique index
IDX_CUST_NAME_ALS on
CUST_NAME_S and
CUST_ALIAS_S columns of
CUSTOMER table with column values
stored in default (ascending) order:
CREATE UNIQUE INDEX
idx_cust_name_als ON CUSTOMER(cust_name_s, cust_alias_s)
Clustered Indexes
MS SQL Server allows you to create
clustered indexes by specifying
CLUSTERED keyword. (See the
CREATE
INDEX syntax described
previously.)The concept is very similar to one
with index-organized tables in Oracle — the actual table rows are with the
index and the physical order of rows is the same as their indexed order; that
is, the rows are re-sorted every time a new one is inserted (or deleted). Only
one clustered index is allowed on a table (or view) at a time; thus, you can
create a clustered index on a column, drop it later, and create a clustered
index on another column, but you can't create two clustered indexes on the same
table simultaneously as data in the table can only be physically organized in
one order.These statements create the table
SALESMAN and unique clustered index
on its
SALESMAN_CODE_S column.
CREATE TABLE SALESMAN (
SALESMAN_ID_N INTEGER NOT NULL, SALESMAN_CODE_S VARCHAR(2) NOT NULL,
SALESMAN_NAME_S VARCHAR(50) NOT NULL, SALESMAN_STATUS_S CHAR(1) DEFAULT 'Y',
CONSTRAINT CHK_SALESSTATUS CHECK (SALESMAN_STATUS_S in ('N', 'Y')) ) CREATE
UNIQUE CLUSTERED INDEX idx_sales_code ON SALESMAN
(salesman_code_s)
The default for a
CREATE
INDEX statement is
NONCLUSTERED except for indexes on
the primary key columns that are automatically created with
CLUSTERED.
Note | You cannot specify a physical location for the index in a CREATE INDEX statement in DB2; all indexes for a table will be created either in the default tablespace or in a tablespace specified in the CREATE TABLE or ALTER TABLE INDEX IN clause. |
MS SQL Server 2000
To create an index in MS SQL Server,
use this syntax:
CREATE
[UNIQUE] [CLUSTERED | NONCLUSTERED] INDEX <index_name> ON
<table_name> | <view_name> ( column [ ASC | DESC ],...) [ON
filegroup]
The syntax is very similar to DB2. You
can specify a filegroup to create the index on. This statement creates the
unique index
IDX_CUST_NAME_ALS on the
CUST_NAME_S and
CUST_ALIAS_S columns of the
CUSTOMER table physically stored on
filegroup
INDEX01 (assuming it exists):
CREATE UNIQUE INDEX
idx_cust_name_als ON CUSTOMER(cust_name_s, cust_alias_s) ON
INDEX01
Tip | In MS SQL Server you can create indexes on views. The view definition must be deterministic and schema bound. |