9.4 Analytics, OLAP, and Data Mining in the Database
Analysis of large data sets is faster
when it takes place where the data is stored (e.g. in the database).
This section describes the functions and other features available for
analysis and data mining in the database.
9.4.1 Functions
Oracle
releases dating back to Oracle8i have each added
more analytic and statistical functions as SQL extensions to the core
database. Including new functions added in Oracle Database
10g and previous database iterations, these
functions now include:Ranking functions
Used to
compute a record's rank with respect to other
records. Functions include RANK, DENSE_RANK, CUME_DIST, PERCENT_RANK,
NTILE, and ROW_NUMBER. Hypothetical ranking was an addition to
Oracle9i.
Windowing aggregate functions
Used to compute cumulative and
moving averages. Functions include SUM, AVG, MIN, MAX, COUNT,
VARIANCE, STDDEV, FIRST_VALUE, and LAST_VALUE.
LAG/LEAD functions
Often
used to compare values from similar time periods, such as the first
quarter of 1999 and the first quarter of 2000.
Reporting aggregate functions
Functions include SUM, AVG, MIN, MAX, COUNT, VARIANCE, STDDEV, and
RATIO_TO_REPORT.
Statistical aggregates
Examples include correlation, linear
regression, and variance.
Descriptive statistics
Examples include average, standard
deviation, variance, min, max, median (via a percentile count), mode,
group-by, and roll-up. Oracle Database 10g
also adds DBMS_STAT_FUNCS for summarizing
numerical columns in a table by applying various statistical
functions.
Correlations
Examples include
Pearson's correlation coefficients,
Spearman's and Kendall's.
Crosstabs
Examples
include enhancements with % statistics, chi squared, phi coefficient.
Hypothesis testing
Examples include T-test, F-test,
One-Way ANOVA, Chi-square, Mann Whitney, Kolmogorov-Smirnov.
Distribution fitting
Examples include normal, uniform,
Poisson, exponential, and Weibull.
Pareto analysis
Examples include 80/20 rule and
cumulative results table.
9.4.2 OLAP and Data Mining Options
For stored cubes
(objects with predefined multidimensional joins), facts, and
dimensions in the relational database, Oracle introduced an OLAP
Option in Oracle9i that is
replacing Express Server, a separate Oracle
multidimensional OLAP engine. OLAP capabilities provided by the
option are accessed through a Java API or directly through SQL.
Oracle9i also first introduced data mining
algorithms in the database accessible via a Java API. These are
described more in Section 9.6 of this chapter.
9.4.3 Other Capabilities
Oracle
Database 10g adds the MODEL clause as an
extension to the SELECT statement. This clause enables
relational data to be treated as multidimensional arrays (much like
spreadsheets) and is also used to define formulas for the arrays.
Finally, Oracle Database 10g also adds support
for IEEE floating-point number types, important in many of these
calculations.
9.4.4 Database Extensibility and the Data Warehouse
A growing trend in data warehousing is the storage of multiple
datatypes within the database. Although we discuss database options
in Chapter 13, we'll quickly mention how these
options might be useful in data warehousing.
9.4.4.1 interMedia
The interMedia
feature opens up the possibilities of including documents, audio,
video, and some locator functions in the warehouse. Of these, text
retrieval (Oracle Text) is most commonly used in warehouses today.
However, the number of companies storing other types of data in
warehouses, such as images, is growing. Often, storage of these types
of data is driven by a need to provide remote users with access.
9.4.4.2 Spatial option
The
Spatial
option is also relevant in a data warehouse in which data is
retrieved based on proximity to certain locations. Spatial data
includes some type of geographic coordinates. Typically, companies
use add-on products in conjunction with Oracle's
Spatial option. An example of this option's use for
data warehousing is a marketing analysis application that determines
the viability of retail outlets at various locations.
9.4.4.3 XML
Oracle added native XML datatype support to the
Oracle9i database and XML and SQL
interchangeability for searching. Oracle is providing key technology
in the development of the XQuery standard, a prototype of which was
first made available for Oracle9i.