2.2 Tables
Tables are
the fundamental core of a relational database. Relational theory
describes tables as abstract objects, ascribing no significance to
the order of the rows or the columns that make up a table. However,
tables also exist in physical form on disk in your database server,
with physical ordering that affects performance. When an application
queries for those physical bytes stored on disk or cached in memory,
the server processes must have some way to reach them.
The physical layout of table rows
affects the performance of reads of those rows, so it is important to
understand the types of tables and how they affect the layout. Figure 2-2 shows four different physical tables,
illustrating four basic patterns of table growth and aging, and shows
how these patterns affect data layouts.
Figure 2-2. Physical table growth and aging

Tables occupy one or more contiguous areas of disk space (called
extents
on Oracle[1]) that the server
can read with minimal read-head movement and maximal efficiency. The
database organizes table rows in
blocks, which are too
small to show here, usually 2KB-16KB in size. These blocks are
constant-sized across most or all of the database (depending on the
vendor). The blocks are the smallest units that the database reads
from disk and caches, as discussed earlier. As formerly empty blocks
within an extent become occupied, the
high-water
markthe highest point of the table that the
database needs to scanrises toward the top of the extent
until, reaching the top, it triggers allocation of a new
extent.[2] Above the high-water mark is space
reserved for future writes, but not space that the database will ever
touch for a read. The high-water mark does not move downward unless
you rebuild or truncate the table. Figure 2-2
illustrates growth patterns described in the following sections.[1] Technically, an extent is contiguous
according to virtual disk addressing, which is all the database
software knows about. At a lower level, a RAID or other
disk-striping/mirroring system can translate these contiguous virtual
addresses to blocks on different disks, but you can safely ignore
this subtlety when tuning SQL.[2] This most closely describes Oracle, but the
differences between vendors, in this area, are not important to SQL
tuning.
2.2.1 Continuous Growth
The continuous
growth pattern, shown for T1 in Figure 2-2, is the most common pattern among transaction
tables, which continually acquire new rows but almost never lose old
rows. It is often regrettable that old rows stay around long after
they have outlived their usefulness, but deciding what is truly safe
to purge is hard (and scary) work, even ignoring the effort of
writing the routines to do the work. Somehow, this work always ends
up at the end of the priority list for a product's
features (and who needs it in the initial release of a product?),
much to the delight of disk vendors.In continuous growth tables, the
level of interest in rows tends to drop with their age, making the
newest rows, conveniently stored together at the top of the table and
most likely to be queried, best to cache. When the newest rows are
the hottest, the natural clustering of new rows makes optimum use of
the cache, and even a very large table can see a good
cache-hit ratio (the
fraction of logical I/Os that avoid physical I/Os) if you use indexed
access that avoids the older rows.A query that touches all of a continuous growth table (up to the
high-water mark, that is), then discards all but some of the newest
rows, will look good when the table is new and tiny. However,
whole-table-access runtime grows linearly, assuming a constant
table-growth rate, and will likely become intolerable. An access path
that touches only new rows will maintain roughly constant efficiency,
given steady table growth, since you will have a roughly constant
number of rows created in the last week or so.
2.2.2 Purge Eldest
The
purge eldest pattern, shown for T2 in Figure 2-2, I call the Ouroboros
pattern, after the mythical snake that continually eats its own tail.
In this table, the oldest rows are periodically purged (all of the
oldest rows, not just some subset), completely emptying their blocks
and making them available for inserts of the newest rows. The
high-water mark need not move once this table reaches mature size,
assuming you delete rows (once you begin deletes) at the same rate
you insert them. The head of this snake (holding the newest rows) is
always chasing the tail (holding the oldest rows), which retreats
with every fresh delete.
From the point of view of keeping the
newest rows physically close together, this pattern is as good as the
continuous growth pattern, with the added advantage that, since table
growth halts once purging begins, the whole table has a better chance
to be well cached. Note that this is an idealized case that is rarely
seen, since retention of a few of the oldest rows, or a growth rate
that exceeds the purge rate, will tend to gradually mix old and new
rows ever more thoroughly.
2.2.3 Purge, Not by Age
The purge, not by
age pattern, shown for T3 in Figure 2-2,
reflects deletes that are not age-driven. Blocks become available for
fresh inserts as soon as they have empty space that exceeds some
threshold (typically 60% empty on Oracle), staying eligible until
empty space falls below another threshold (typically 10% empty on
Oracle). This happens to blocks that are scattered fairly randomly
throughout the table, so new rows scatter accordingly, and
caching becomes harder with time
both as a result of average blocks being emptier and as a result of
interesting rows being more spread out. However, such a pattern of
purging tends to imply that the level of interest in rows is not
age-related, tending to make such a table hard to cache regardless of
whether purging scatters new rows.
2.2.4 Complete Purge and Regrowth
The complete
purge and regrowth pattern, shown for T4 in Figure 2-2, reflects a wholesale purge with the
beginnings of regrowth. In Figure 2-2, the entire
table contents were recently deleted and the oldest rows shown are
not that old, since the table has a long way to grow before it
reaches its old high-water mark. This pattern is similar to the
continuous growth pattern shown in T1, except that, since the table
was formerly large and has not been rebuilt, the high-water mark has
not fallen back, forcing full table scans to do just as much physical
I/O as before the purge.
|
• Table of Contents• Index• Reviews• Examples• Reader Reviews• Errata• AcademicSQL TuningBy
