9.3 Query Optimization
Oracle first provided the ability to
recognize a star schema in the query optimizer in Oracle7 and has
focused on making its cost-based query optimizer smarter in
response to decision-support queries in subsequent database releases.
To further improve optimizer prediction accuracy, in Oracle Database
10g optimizer predictions are compared to actual
runtime performance and any errors are subsequently corrected
automatically.How does the optimizer handle a query against a
star schema? First, it finds a
sales transactions fact table (shown in Figure 9-2) with a lot more
entries than the surrounding dimension tables. This is the clue that
a star schema exists. As Oracle7 evolved, the optimizer began to
produce much smarter plans. The optimizer for a standard relational
database typically would have tried to join each of the dimension
tables to the fact table, one at a time. Because the fact table is
usually very large, using the fact table in multiple joins takes a
lot of time.Cartesian product joins were added to Oracle7 to first join the
dimension tables, with a subsequent single join back to the fact
table in the final step. This technique works relatively well when
there are not many dimension tables (typically six or fewer, as a
rule of thumb, to keep the Cartesian product small) and when data is
relatively well populated.In some situations, there are a fairly large number of dimension
tables or the data in the fact table is sparse. For joining such
tables, a parallel bitmap star join may be selected by the optimizer.
9.3.1 Bitmap Indexes
Bitmap indexes,
described in Chapter 4, were first introduced in
Oracle7 to speed up the type of data retrieval and joins in data
warehousing queries. Bitmap indexes in Oracle are typically
considered for columns in which the data has low cardinality.
Cardinality
is the number of different values in an index divided by the number
of rows. There are various opinions about what low cardinality
actually is. Some people consider cardinality as high as 10% to be
low, but remember that if a table has a million rows, that
"low" cardinality would mean
100,000 different values in a column!In a bitmap index, a value of 1 in the index indicates that a value
is present in a particular row and 0 indicates that the value is not
present. A bitmap is built for each of the values in the indexed
columns. Because computers are built on a concept of 1s and 0s, this
technique can greatly speed up data retrieval. In addition, join
operations such as AND become a simple addition operation across
multiple bitmaps. A side benefit is that bitmap indexes can provide
considerable storage savings.Figure 9-3 illustrates the use of a bitmap index in a compound WHERE
clause. Bitmap indexes can be used together for even faster
performance. The bitmap indexes are essentially stacked together, as
a set of punch cards might be. Oracle simply looks for those parts of
the stack with all the bits turned on (indicating the presence of the
value), in the same way that you could try to stick a knitting needle
through the portions of the card stack that were punched out on all
of the cards.
Figure 9-3. Bitmap index operation in a compound WHERE clause

In Oracle, star-query performance is improved when bitmap indexes are
created on the foreign-keys columns of the fact table that link to
the surrounding dimension tables. A parallel bitmap star join occurs
in which the bitmaps retrieve only the necessary rows from the fact
table and the rows are joined to the dimension tables. During the
join, sparseness (i.e., a large quantity of empty values) is
recognized inherently in the bitmaps, and the number of dimension
tables isn't a problem. This algorithm can also
efficiently handle a snowflake schema,
which is an extension of a standard star schema in which there are
multiple tables for each dimension. To further speed queries,
Oracle9i added a bitmap join index from fact
tables to dimension tables.Performing queries in parallel also obviously improves performance.
Joins and sorts are frequently used to solve decision-support
queries. Parallelism is described in Chapter 6. That chapter lists
all of the applications that Oracle can perform in parallel (see
"What Can Be Parallelized?").
9.3.2 Real Application Clusters
Real Application Clusters, which
replaced Oracle Parallel Server as of Oracle9i,
further expands parallelism by enabling queries to transparently
scale across nodes in clusters or in grids of computer systems.
9.3.3 Summary Tables
Data within dimensions is usually
hierarchical in nature (e.g., in the time dimension, day rolls up to
week, which rolls up to month, which rolls up to quarter, which rolls
up to year). If the query is simply looking for data summarized at a
monthly level, why should it have to sort through more detailed daily
and weekly data? Instead, it can simply view data at or above that
level of the hierarchy. Formerly, data warehousing performance
consultants designed these types of summary tablesincluding
multiple levels of precalculated summarization. For example, all the
time periods listed in Figure 9-2 can be calculated on the fly using
different groupings of days. However, to speed queries based on a
different time series, a data warehouse can have values precalculated
for weeks and months and stored in summary tables to which queries
can be redirected.
9.3.4 Materialized Views
Oracle8i introduced the concept of
materialized views for the creation of summary tables for
facts and dimensions that can represent rollup levels in the
hierarchies. A materialized view provides precomputed summary data;
most importantly, a materialized view is automatically substituted
for a larger detailed fact table when appropriate. The cost-based
query optimizer can perform query rewrites to these summary tables
and rollup levels in the hierarchy transparently, often resulting in
dramatic increases in performance. For instance, if a query can be
answered by summary data based on sales by month, the query optimizer
will automatically substitute the materialized view for the more
granular table when processing the query. A query at the quarter
level might use monthly aggregates in the materialized view,
selecting the months needed for the quarter(s). There are plans
beyond the first release of Oracle Database 10g
to add query rewrite capabilities with which the optimizer can make
use of multiple appropriate materialized views.