SQL Tuning [Electronic resources] نسخه متنی

10.3 Tuned Queries that Return Few Rows, Slowly

Rarely, a
fully tuned query that returns few rows, even without aggregation,
runs slowly. Since tuned queries drive from the best filter, this
implies that such a query must reach many rows at some point in the
execution plan before later filters reduce the rowcount to something
moderate. The rules of thumb this book describes are designed to make
this a rare event; in my own experience, it has happened just once or
twice per year of tuning, on average.

10.3.1 Why Queries Sometimes Read Many Rows to Return Few

Modest-rowcount queries that are slow
even after tuning can be traced to important filters that cannot all
be reached before at least one large table is reached. Consider, for
example, Figure 10-1. If the root detail table,
M, has 50,000,000 rows, then the detail join
ratios show that A1 and A2 have
10,000,000 rows each, and B1 and
B2 have 100,000 rows each. Robust, index-driven
execution plans with join order (B1, A1,
M, A2, B2) or (B2, A2, M, A1, B1) are equally attractive, by symmetry. For
concreteness, consider just the first join order. In that plan, you
reach 100 rows in B1, 10,000 rows in
A1, and 50,000 rows in M,
A2, and B2 before you discard
all but 50 rows for failing to meet the filter condition on
B2.

Figure 10-1. A slow query that returns few rows

If you relax the robustness requirement, you can preread the 100 rows
that meet the filter in B2 and join those to the
earlier-reached rows by a hash join. You could even prejoin those 100
rows to A2 with nested loops and perform a hash
join between rows reached with nested loops to (B1, A1, M) and rows reached with nested loops that join the
combination (B2, A2) (assuming you could get your
optimizer to produce such an exotic plan). This would reduce the
rowcount you would need to reach A2 to 10,000,
while keeping the rowcount to reach B2 at 100.
However, none of these strategies eliminates the need to read 50,000
rows from the largest, least well-cached table, M,
and that will surely take longer than you would like for a query that
returns just 50 rows.

The root of this problem is a join diagram that has selective filters
distributed across two or more branches under some large table. The
result is that the ultimate level of filtration (a net filter ratio
equal to the product of the individual filter ratios) cannot even be
approached until after the database has already reached the large
table. The highly selective combination of filters is reached too
late in the execution plan to avoid excessive reads.

10.3.2 Optimizing Queries with Distributed Filters

To make optimum use of distributed filters, you need to somehow bring
them closer together in the query diagram, preferably into a single
table. Consider Figure 10-1 again. Assume that each
of the filters on B1 and B2 is
an equality condition on some column of that
filter's table. A normalized database design places
the filter column for B1 where it is because it
encodes a property that you need to specify only once per
B1 entity. Furthermore, this is a property that
you could not infer from knowledge of entities for any master tables
that join one-to-many to B1. If you placed this
column in A1, that would be a
denormalization, defined as a property you could
infer from knowledge of A1's
matching master entity stored in B1. Such a
denormalization would require redundant data to be stored in the
A1 table, since all 100 A1
entities (on average) that join to any particular
B1 row must have the same value for this inherited
column. However, in principle, all properties of master-table
entities are inherited properties of the detail entities that match
those master table entities.

The form of denormalization I describe is not the only form of
denormalization there is. Whole books are written on the subject of
normalizing databases, but this simplest form of denormalization is
the only form pertinent to this discussion.

For example, if Customer_ID is a property of
Orders, it is also an inherited property of the
Order_Details that match those orders. With
denormalization, you can always push properties up the join tree to
nodes above the nodes that own the properties under normalized
design. Such inheritance of filter properties need not stop at the
first level. For example, Customer_Name, a
normalized property of Customers, could be
inherited up two levels, through Orders, all the
way to Order_Details.

This possibility of pushing filter columns up the join tree as high
as you wish points to the ultimate solution to performance problems
of distributed filters. To avoid distributed-filter problems, keep
pushing the most selective filter conditions upward until they come
together in a single node, which will inherit a combined filter that
has the product of the original filter selectivities. In the extreme
case, all filters rise all the way to the root detail node, and the
query will read only the few rows it ultimately returns from that
table and join downward to the same number of rows from the
downstream master tables. In the problem shown in Figure 10-1, table M acquires two
denormalized columns, from B1 and
B2, respectively. The combined filter on these two
columns has a selectivity of 0.000001 (0.001 x 0.001), or one row out
of 1,000,000, as shown in Figure 10-2. The optimum
execution plan for that query diagram reads 50 rows from
M and does nested loops, through primary-key
indexes, to the 50 rows from each of the other tables,
A1, A2, B1,
and B2. This is a very fast plan.

Figure 10-2. Promoting filters to higher tables, using denormalization

It might seem odd to mention this powerful technique for combining
filters so late in the book.
Many texts assume that denormalization is
a ubiquitous necessity, but I have not found this to be the case at
all. On most applications I have tuned, I have not needed to add a
single denormalization for tuning purposes, and tuning purposes are
the only good reason for denormalization. If you follow the
techniques of the rest of this book, you too should have to use
denormalization only rarely, only when you identify a specific
important query that cannot otherwise be made fast enough.

Many queries can be made faster through denormalization, but that is
not the point. The point is that, even without denormalization,
almost all queries can be made fast enough, with correct tuning. An
improvement of a few milliseconds does not justify the costs of
denormalization. Most performance-driven denormalizations deliver
little more than a tiny improvement versus the best optimization
possible without denormalization.

If done perfectly, with rigorously designed database triggers that
automatically keep the denormalized data perfectly in sync,
denormalization can be functionally safe. However, it is not free; it
takes up disk space and slows inserts, updates, and deletes that must
fire the triggers that keep the data in sync. Given the unavoidable
costs of denormalization, I recommend it only when you find a
specific, high-priority SQL statement that cannot be made fast enough
by other means, which is a very rare case .

Time goes, you say? Ah no! Alas, Time stays, we go.

Henry Austin Dobson, The Paradox of Time

SQL Tuning [Electronic resources] نسخه متنی

فارسی

کردی

العربیه

اردو

Türkçe

Русский

English

Français

کانال فیلم من

تبیان من

فایلهای من

کتابخانه من

پنل پیامکی

وبلاگ من

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی