6.3 Oracle and Parallelism
The ability to parallelize operations
is one of the most important features of the Very Large
Database (VLDB). Database servers with multiple CPUs, which are
called symmetric
multiprocessing (SMP) machines, are the norm today for
most database servers. As performance demands increase and data
volumes continue to grow, you will increasingly need to use multiple
processors and disks to reduce the time needed to complete a given
task. Oracle supports parallelism within
a single SMP server as well as parallelism across multiple nodes,
using Oracle Parallel Server/Real Application
Clusters. Executing a SQL statement in parallel will consume more of
all three machine resourcesCPU, memory, and disk I/Obut
complete the overall task faster.Parallelism affects the amount of memory and CPU resources used to
execute a given task in a fairly linear fashionthe more
parallel processes used, the more resources consumed for the
composite task. Each parallel execution process has a Program Global
Area (PGA) that consumes memory and performs work. Each parallel
execution process takes its own slice of CPU, but more parallel
processes can reduce the total amount of time spent on disk I/O,
which is the place in which bottlenecks can most readily appear.Two types of parallelism are possible within an Oracle database:Block-range parallelism
Driven by ranges of database blocks
Partition-based parallelism
Driven by the number of partitions or subpartitions involved in the
operation
The following sections describe these types of parallelism.
6.3.1 Block-Range Parallelism
In 1994, Oracle 7.1 introduced the
ability to dynamically parallelize table scans and a variety of
scan-based functions. This parallelism was based on the notion
of block ranges, in which the Oracle server would
understand that each table contained a set of data blocks that
spanned a defined range of data. Oracle7 implemented block-range
parallelism by dynamically breaking a table into pieces, each of
which was a range of blocks, and then used multiple processes to work
on these pieces in parallel. Oracle's implementation
of block-range parallelism was unique in that it
didn't require physically partitioned tables to
achieve parallelism.With block-range parallelism, the client session that issued the SQL
statement transparently becomes the parallel execution coordinator,
dynamically determining block ranges and assigning them to a set of
parallel execution (PE) processes.
Once a PE process has completed an assigned block range, it returns
to the coordinator for more work. Not all I/O occurs at the same
rate, so some PE processes may process more blocks than others. This
notion of "stealing work" allows
all processes to participate fully in the task, providing maximum
leverage of the machine resources.Block-range parallelism scales linearly
based on the number of PE processes, provided you have adequate
hardware resources. The key to achieving scalability with parallelism
lies in hardware basics. Each PE process runs on a CPU and requests
I/O to a device. If you have enough CPUs reading enough disks,
parallelism will scale. If the system encounters a bottleneck on one
of these resources, scalability will suffer. For example, four CPUs
reading two disks will not scale much beyond the two-way scalability
of the disks and may even sink below this level if the additional
CPUs cause contention for the disks. Similarly, 2 CPUs reading 20
disks will not scale to a 20-fold performance improvement. The system
hardware must be balanced for parallelism to scale.Most large systems have far more disks than CPUs. In these systems,
parallelism results in a randomization of I/O across the I/O
subsystem. This is useful for concurrent access to data as PE
processes for different users read from different disks at different
times, resulting in I/O that is distributed across the available
disks.A useful analogy for dynamic parallelism is eating a pie. The pie is
the set of blocks to be read for the operation, and the goal is to
eat the pie as quickly as possible using a certain number of people.
Oracle serves the pie in helpings, and when a person finishes his
first helping, he can come back for more. Not everyone eats at the
same rate, so some people will consume more pie than others. While
this approach in the real world is somewhat unfair,
it's a good model for parallelism because if
everyone is eating all the time, the pie will be consumed more
quickly. The alternative is to give each person an equal serving and
wait for the slower eaters to finish.Figure 6-5 illustrates the splitting of a set of blocks into ranges.
Figure 6-5. Dynamic block-range parallelism

6.3.2 Parallelism for Tables and Partitions of Tables
With partitioned
tables, introduced in Oracle8, an operation may involve
one, some, or all of the partitions of a partitioned table. There is
essentially no difference in how block-range parallelism dynamically
splits the set of blocks to be read for a regular table as opposed to
a partitioned table. Once the optimizer has determined which
partitions should be accessed for the operation, all the blocks of
all partitions involved are treated as a pool to be broken into
ranges.This assumption by the optimizer leads to a key consideration for
using parallelism and partitioned tables. The degree of parallelism
(i.e., the number of parallel execution processes used for the table
as a whole) is applied to the set of partitions that will be used for
an operation. The optimizer will eliminate the use of partitions that
do not contain data an operation will use. For instance, if one of
the partitions for a table contains ID numbers below 1,000, and if a
query requests ID numbers between 1,100 and 5,000, the optimizer
understands that this query will not access this partition.Since Oracle9i, you can also partition tables
based on a list of specific values, although this type of
partitioning is typically used to partition tables for maintenance
operations.The use of a subset of partitions has important implications for
setting a degree of parallelism that provides good scalability for
queries that scan some versus all of a table's
partitions. A level of parallelism that may be good for the
partitioned table as a whole may cause I/O contention for a query
that accesses only one partition stored on fewer disks, as multiple
parallel processes try to access the same disk.If you expect that your queries will use partition elimination or
pruning and you plan on using parallelism, you should stripe each
partition over a sufficient number of drives to scale effectively.
This will ensure scalability regardless of the number of partitions
accessed. This striping can be achieved manually through the use of
multiple datafiles on multiple disks, through the use of striped
arrays, or through a combination of both approaches.
6.3.3 What Can Be Parallelized?
Many people think that Oracle can
parallelize only simple table scans. While the parallelization of a
table scan is involved in most parallelized operations, Oracle
actually can parallelize far more than simple scans. The list of
operations that can be parallelized using block-range parallelism
includes the following:Tablespace creationIndex creation and rebuildsOnline index reorganizations and rebuildsIndex organized table reorganizations and movementsTable creation, such as summary creation using CREATE TABLE...AS
SELECTPartition-maintenance operations, such as moving and splitting
partitionsData loadingImposing integrity constraintsStatistics gathering (automatically gathered in Oracle Database
10g)Backups and restoresDML operations (INSERT, UPDATE, DELETE)Query processing operations
The specific features of query
processing that may be parallelized include:Table scansNested loopsSort merge joinsHash joinsBitmap star joinsIndex scansPartition-wise joinsAnti-joins (NOT IN)SELECT DISTINCTUNION and UNION ALLORDER BYGROUP BYAggregationsUser-defined functions
Oracle Database 10g has added the ability to run
the OLAP AGGREGATE statement and the Import utility in parallel.
6.3.4 Controlling Oracle's Parallel Resource Usage
An Oracle instance has a pool of
parallel
execution (PE) processes that are available to the database users.
Controlling the maximum number of active PE processes is important;
too many PE processes will overload the machine, leading to resource
bottlenecks and performance degradation. The pool of PE processes is
governed by the following two initialization parameters:PARALLEL_MIN_SERVERS
A number that acts as a floor for the pool of PE processes. When the
instance is started, it will spawn this number of PE processes. A
typical value is twice the number of CPUs.
PARALLEL_MAX_SERVERS
A number that acts as a ceiling for the pool of PE processes. As
users request parallel operations, the instance will spawn additional
PE processes up to this value. This limits the maximum parallel
activity on the system and can be used to avoid overloading the
machine. The effective ceiling for a given machine will vary. If the
load decreases, the instance will gradually scale the pool back to
PARALLEL_MIN_SERVERS.
A user session requests a number of PE processes from the pool based
on the degree of parallelism for the operation the session is
performing.
6.3.4.1 Setting the degree of parallelism
The degree of parallelism
governs how many parallel-execution processes can be used for a given
operation. The degree of parallelism is determined in three mutually
exclusive ways:If a table has default parallelism enabled using ALTER TABLE
tablename PARALLEL, the optimizer will set the
degree of parallelism automatically. In Oracle 7.3 and 8.0.3, default
parallelism was set to the minimum of two values: the number of CPUs
in the machine or the number of distinct devices used to store the
table.[3] Based on customer feedback, in Oracle
8.0.4 (and subsequent releases) default parallelism defaults to the
number of CPUs to recognize the prevalent use of striped disk arrays.
Large tables in which parallelism is useful are typically stored on
one or more striped disk arrays so that the resulting number of
actual spindles used is greater then the number of CPUs. CPU count
represents a more realistic value for default parallelism.[3] If you use operating-system or disk-subsystem
striping to stripe a datafile over more than one disk, this striped
datafile presents itself to Oracle as a single datafile on a single
device. Using a single array of five disks will not equate to five
devices for default parallelism. Using five datafiles, each on a
separate array, will. So, take care when using default parallelism
and transparent or OS-level disk striping in these earlier Oracle
versions. You can set the parallelism for a table to an explicit value using
ALTER TABLE tablename PARALLEL
N, where N is the desired
degree of parallelism. This technique is useful where you want to
override the optimizer's default value because of
hardware characteristics. For example, you should set the parallelism
if the machine can handle a degree of twice the number of CPUs for a
particular table.You can override any table-level setting and explicitly set the
degree using the PARALLEL optimizer hint for queries or the
PARALLEL
clause, which is valid for some statements, such as CREATE INDEX or
CREATE TABLE AS SELECT. This is useful if you want to use more
resources for a particular task, such as an index build, but leave
the degree of parallelism for other operations unchanged.
6.3.4.2 Intra-operation parallelism
For any operation involving more than one step, such as a scan and
sort, a degree of parallelism of N will actually
use 2 x N PE processes, one set per
suboperation. For example, N scanner processes
will feed N sorter processes. At the end of the
overall task, the coordinator combines the results from the separate
PE processes. Figure 6-6 illustrates transparent parallelism within
and between sets of PE processes.
Figure 6-6. Intra-operation and inter-operation parallelism

6.3.5 Adaptive Parallelism
Determining the optimal degree of parallelism in the presence of
multiple users and varying workloads has proven challenging. For
example, a degree of 8 for a query may provide excellent performance
for 1 or 2 users, but what if 20 users query the same table? This
scenario would call for 160 PE processes (8 PEs for each of the 20
users), which could overload the machine. Setting the degree to a
lowest common denominator value (for example, 2) provides effective
parallelism for higher user counts, but does not leverage resources
fully when fewer users are active.Oracle8i introduced the notion of
self-tuning adaptive parallelism. This feature
automatically scales down parallelism as the system load increases
and scales it back up as the load decreases. When an operation
requests a degree of parallelism, Oracle will check the system load
and lower the actual degree the operation uses to avoid overloading
the system. As more users request parallel operations, the degree
they receive will become lower and lower until operations are
executing serially. If activity decreases, subsequent operations will
be granted increasing degrees of parallelism. This adaptability frees
the DBA from the difficult task of trying to determine the optimal
degree of parallelism in the face of constant changes in workload.Adaptive parallelism takes two factors into account in determining
the degree of parallelism granted to an operation:System load.Parallelism resource limitations of the user's
consumer group if the Database Resource Manager is active. (The
Database Resource Manager is described later in this chapter.) This
is important, because it means that adaptive parallelism respects
resource plans if they're in place.
The following two initialization
parameters enable adaptive parallelism:PARALLEL_AUTOMATIC_TUNING
When set to TRUE, enables the adaptive parallelism feature.
PARALLEL_MIN_PERCENT
Determines what percentage of the requested parallelism an operation
must be granted. For example, suppose this parameter were set to 50
and an operation requested a degree of 8. If Oracle grants the
operation a degree of 4, the operation will continue as it received
the minimum 50% of the requested degree of parallelism. If Oracle
grants a degree less than 4, which would be less than 50% of the
requested degree of 8, an error will be returned to the user. If you
are using adaptive parallelism, set PARALLEL_MIN_PERCENT to 0 to
allow Oracle to scale parallelism down as needed.
6.3.6 Partition-Based Parallelism
A small subset of Oracle's parallel functionality is
based on the number of partitions or subpartitions accessed by the
statement to be parallelized. For block-range parallelism, the piece
of data each PE process works on is a range of blocks. For
partition-based parallelism, the pieces of data that drive
parallelism are partitions or subpartitions of a table. The
operations in which parallelism is based on the number of partitions
or subpartitions include the following:Updates and deletesIndex scansIndex creation and rebuilds on partitioned tables
6.3.6.1 Parallelism for partitions and subpartitions of a table
Oracle8 introduced support for parallel Data Manipulation Language
(DML), or the ability to execute INSERT, UPDATE, and DELETE
statements in parallel. This type of parallelism improves the
performance of large bulk operations (for example, an update to all
the rows of a very large table).In Oracle8 the degree of parallelism for updates and deletes is tied
to the number of partitions involved, while in
Oracle8i and beyond the degree of parallelism
for updates and deletes is tied to the number of partitions or
subpartitions involved. A table with 12 partitions (for example, one
partition for each month of the year) can have a maximum degree of 12
for a parallel update or delete. An update to only one month of data
would have no parallelism because it involves only one partition. If
the table were created using Oracle's composite
partitioning (for example, with 4 hash subpartitions by PRODUCT_ID
within each month), the maximum degree of parallelism for the entire
table would be 48, or 12 partitions with 4 subpartitions each. An
update to one month of data could have a degree of 4 because each
month contains four hash subpartitions. If the table is not
partitioned, Oracle cannot perform updates or deletes in parallel.Oracle8 and more recent releases can execute index creation, index
rebuilds, and index scans for partitioned indexes in parallel using
the same semantics as parallel DML: one PE process per partition or
subpartition of the index.
6.3.6.2 Fast full index scans for nonpartitioned tables
People often assume that the Oracle database can parallelize index
scans only if the target index is partitioned. Oracle 7.3 introduced
the ability to perform parallel index scans on nonpartitioned indexes
for a certain case. If the index scan operation were
"unbounded," meaning that the
entire index was going to be accessed to satisfy the query, then
Oracle 7.3 and higher would use block-range parallelism to access the
entire index in parallel. While Oracle can perform index scans for
nonpartitioned indexes, this feature applies to a narrow set of
queries. Partition-based index scans apply to a much broader range of
queries.
6.3.6.3 Parallel insert for nonpartitioned and partitioned tables
Oracle can execute an INSERT statement of the form INSERT INTO
tableX SELECT...FROM tableY
in parallel for nonpartitioned and partitioned tables. Oracle uses a
set of PE processes executing block-range parallelism for the SELECT
portion of the INSERT statement. These PE processes pass the rows to
a second set of PE processes, which insert the rows into the target
table. The target table can be a nonpartitioned or partitioned table.
Parallelism for an insert is not exactly block-range or
partition-based.