Oracle Essentials [Electronic resources] : Oracle Database 10g, 3rd Edition

Jonathan Stern

نسخه متنی -صفحه : 167/ 85
نمايش فراداده

9.3 Query Optimization

Oracle first provided the ability to recognize a star schema in the query optimizer in Oracle7 and has focused on making its cost-based query optimizer smarter in response to decision-support queries in subsequent database releases. To further improve optimizer prediction accuracy, in Oracle Database 10g optimizer predictions are compared to actual runtime performance and any errors are subsequently corrected automatically.

How does the optimizer handle a query against a star schema? First, it finds a sales transactions fact table (shown in Figure 9-2) with a lot more entries than the surrounding dimension tables. This is the clue that a star schema exists. As Oracle7 evolved, the optimizer began to produce much smarter plans. The optimizer for a standard relational database typically would have tried to join each of the dimension tables to the fact table, one at a time. Because the fact table is usually very large, using the fact table in multiple joins takes a lot of time.

Cartesian product joins were added to Oracle7 to first join the dimension tables, with a subsequent single join back to the fact table in the final step. This technique works relatively well when there are not many dimension tables (typically six or fewer, as a rule of thumb, to keep the Cartesian product small) and when data is relatively well populated.

In some situations, there are a fairly large number of dimension tables or the data in the fact table is sparse. For joining such tables, a parallel bitmap star join may be selected by the optimizer.

9.3.1 Bitmap Indexes

Bitmap indexes, described in Chapter 4, were first introduced in Oracle7 to speed up the type of data retrieval and joins in data warehousing queries. Bitmap indexes in Oracle are typically considered for columns in which the data has low cardinality. Cardinality is the number of different values in an index divided by the number of rows. There are various opinions about what low cardinality actually is. Some people consider cardinality as high as 10% to be low, but remember that if a table has a million rows, that "low" cardinality would mean 100,000 different values in a column!

In a bitmap index, a value of 1 in the index indicates that a value is present in a particular row and 0 indicates that the value is not present. A bitmap is built for each of the values in the indexed columns. Because computers are built on a concept of 1s and 0s, this technique can greatly speed up data retrieval. In addition, join operations such as AND become a simple addition operation across multiple bitmaps. A side benefit is that bitmap indexes can provide considerable storage savings.

Figure 9-3 illustrates the use of a bitmap index in a compound WHERE clause. Bitmap indexes can be used together for even faster performance. The bitmap indexes are essentially stacked together, as a set of punch cards might be. Oracle simply looks for those parts of the stack with all the bits turned on (indicating the presence of the value), in the same way that you could try to stick a knitting needle through the portions of the card stack that were punched out on all of the cards.

Figure 9-3. Bitmap index operation in a compound WHERE clause

In Oracle, star-query performance is improved when bitmap indexes are created on the foreign-keys columns of the fact table that link to the surrounding dimension tables. A parallel bitmap star join occurs in which the bitmaps retrieve only the necessary rows from the fact table and the rows are joined to the dimension tables. During the join, sparseness (i.e., a large quantity of empty values) is recognized inherently in the bitmaps, and the number of dimension tables isn't a problem. This algorithm can also efficiently handle a snowflake schema, which is an extension of a standard star schema in which there are multiple tables for each dimension. To further speed queries, Oracle9i added a bitmap join index from fact tables to dimension tables.

Performing queries in parallel also obviously improves performance. Joins and sorts are frequently used to solve decision-support queries. Parallelism is described in Chapter 6. That chapter lists all of the applications that Oracle can perform in parallel (see "What Can Be Parallelized?").

9.3.2 Real Application Clusters

Real Application Clusters, which replaced Oracle Parallel Server as of Oracle9i, further expands parallelism by enabling queries to transparently scale across nodes in clusters or in grids of computer systems.

Remember that these Oracle features use the cost-based optimizer and, prior to Oracle Database 10g, you should run statistics periodically (using the ANALYZE statement) on the tables to ensure good performance. Statistics gathering can be done in parallel.

In Oracle Database 10g, statistics gathering can be automatic and populates the Automatic Workload Repository. For example, the Oracle Database 10g SQL Tuning Advisor leverages this information when making tuning recommendations.

9.3.3 Summary Tables

Data within dimensions is usually hierarchical in nature (e.g., in the time dimension, day rolls up to week, which rolls up to month, which rolls up to quarter, which rolls up to year). If the query is simply looking for data summarized at a monthly level, why should it have to sort through more detailed daily and weekly data? Instead, it can simply view data at or above that level of the hierarchy. Formerly, data warehousing performance consultants designed these types of summary tablesincluding multiple levels of precalculated summarization. For example, all the time periods listed in Figure 9-2 can be calculated on the fly using different groupings of days. However, to speed queries based on a different time series, a data warehouse can have values precalculated for weeks and months and stored in summary tables to which queries can be redirected.

9.3.4 Materialized Views

Oracle8i introduced the concept of materialized views for the creation of summary tables for facts and dimensions that can represent rollup levels in the hierarchies. A materialized view provides precomputed summary data; most importantly, a materialized view is automatically substituted for a larger detailed fact table when appropriate. The cost-based query optimizer can perform query rewrites to these summary tables and rollup levels in the hierarchy transparently, often resulting in dramatic increases in performance. For instance, if a query can be answered by summary data based on sales by month, the query optimizer will automatically substitute the materialized view for the more granular table when processing the query. A query at the quarter level might use monthly aggregates in the materialized view, selecting the months needed for the quarter(s). There are plans beyond the first release of Oracle Database 10g to add query rewrite capabilities with which the optimizer can make use of multiple appropriate materialized views.