Oracle Essentials [Electronic resources] : Oracle Database 10g, 3rd Edition

Jonathan Stern

نسخه متنی -صفحه : 167/ 84
نمايش فراداده

9.2 Data Warehouse Design

The database serves as the foundation of the data warehouse: it is the place where the data is stored. But there is more to a data warehouse than simply dataa data warehouse becomes useful only when business users properly use data stored within it. This may seem like a trivial point, but we've seen numerous companies build data warehouses without consulting business users to determine business needs; thus, the database ends up with very few users, little activity, and no business intelligence is gained.

Assuming that your warehouse is well planned and there is a demand for the data, your next challenge will be to figure out how to handle the demand. You will be faced with the need to design your data warehouse to deliver appropriate performance to your usersperformance that may initially seem far beyond the capabilities of your system, because the information requested from the data warehouse can involve summaries and comparisons of massive amounts of detailed data.

When you start designing the data warehouse, also remember that a data warehouse is never complete. When the business changes, so too must the data warehouse. Thus, the ability to track changes through metadata stored in a repository often becomes critical in design phases.

Various design tools can provide this capability. Oracle's Warehouse Builder provides a metadata repository and also the capability to import metadata from operational tables and then forward-engineer new schema and tables. A data warehouse designer creates columns for the new tables and builds constraints for the new schema. Maps are then created between source and target columns with appropriate transformations. DML scripts for creation of new tables, and PL/SQL or SQL*Loader scripts for extraction, transformation, and loading (ETL) are automatically generated.

As noted previously, data warehouses historically have had a different set of usage characteristics from those of an OLTP database. One aspect that makes it easier to meet data warehousing performance requirements is the high percentage of read operations. Oracle's locking model, described in detail in Chapter 7, is ideally suited to support data warehouse operations. Oracle doesn't place any locks onto data that's being read, thus reducing contention and resource requirements for situations in which there are a lot of database reads. Oracle is consequently well-suited to act as the repository for a data warehouse. Because locks don't escalate, Oracle is also extremely appropriate for near real-time data feeds into the warehouse in a scenario not unlike some OLTP implementations.

Warehousing usage characteristics also lead to different types of design schema. In OLTP databases, transaction data is usually stored in multiple tables and data items are stored only once. If a query requests data from more than one transaction table, the tables are joined together. Typically, the database query optimizer decides which table to use as the starting point for the join, based on the assumption that the data in the tables is essentially equally important.

A warehouse schema may use a different model. Although Oracle-based data warehouses may be modeled as third normal form (3NF), when business users need an understandable schema to formulate their own ad-hoc queries or analytical processing is required, key transaction data may be more appropriately stored in a central fact table, surrounded by dimension or look-up tables, as shown in Figure 9-2. The fact table can contain summarized data for data items duplicated elsewhere in the warehouse, and dimension tables can contain multiple hierarchies. As companies consolidate their data marts into enterprise data warehouses, many now deploy a variation called a hybrid schema, a mixture of third normal form and star schema.

Ralph Kimball, author of the widely read book The Data Warehouse Toolkit (see Appendix B for details), is largely credited with discovering that users of data warehouses typically pose their queries in such a manner that the star schema, shown in Figure 9-2, is an appropriate model to use. A typical query might be something like the following:

Show me how many sales of computers (a product type) were sold by a store chain (a channel) in Wisconsin (a geography) over the past 6 months (a time).

The schema in Figure 9-2 shows a relatively large sales transactions table (called a fact table) surrounded by smaller tables (called dimensions or look-up tables). The query just described is often called multidimensional, because several dimensions are included (time is almost always one of them). Because these queries are typical in a data warehouse, the recognition of the star schema by a cost-based optimizer can deliver enormous performance benefits.

Figure 9-2. Typical star schema