9.5 Unsolvable Problems
So far,
I've described strategies for solving what I would
call pure SQL tuning problems, where the text of
a slow query constitutes essentially the whole problem statement and
where you can tune that query out of context with the rest of the
application. The following narrowly defined performance problem
statement illustrates the boundaries of the problem
I've attacked so far:
Alter a given SQL statement to deliver precisely the same rows it
already delivers (thus requiring no change to the rest of the
application) faster than some predefined target runtime.
Assuming the predefined target runtime is set rationally according to
real business needs, there are four basic ways that this problem can
be unsolvable:
The query runs so many times that the
target runtime must be set very low. This is particularly true of
queries that the application repeats hundreds of times or more to
resolve a single, online, end-user request.
The query must return too many table rows
(taking into account the number of tables) to have any hope of
meeting the target, even with excellent caching in the database.
The query must
aggregate too many table rows (taking into account the number of
tables) to have any hope of meeting the target, even with excellent
caching in the database.
The layout of the data
provides no potential execution plan that avoids high running
rowcounts at some point in the execution plan, even while the final
returned rowcount is moderate.
In this section, I describe how to recognize each type of problem. In
Chapter 10, I will discuss solving these
"unsolvable" problems by stepping
outside the box that the too-narrow problem statement imposes.The first problem type is self-explanatory: the target is way lower
than an end user should care about. The explanation turns out to be
that a given end-user action or batch process requires the query to
run hundreds or even millions of times to perform what, to the end
user, is a single task.The second and third problem types are easy to recognize in a query
diagram. Like Figure 9-5, the query, which reads at
least one large table, turns out to have no filters or only a couple
of weakly selective filters. If you have no aggregation (no
GROUP BY, for example), you
have the second problem. When you test the best nested-loops plan you
can find, with any row-ordering steps removed so you can see rows
before sorting, you discover that the query returns rows at a very
high rate but returns so many rows that it runs interminably. The
third problem, with a GROUP BY or other
aggregation, looks just like the second problem type, but with the
addition of aggregation.The fourth problem type is the subtlest. A query diagram with at
least one large table has several semiselective filters spread
throughout a query diagram, often with subqueries, in such a way that
large fractions of one or more large tables must be read before you
can reach enough filters to reduce the rowcount to something
moderate. This case might sound like it should be quite common, but
it actually turns out to be rare; I've run into it
probably less than once a year, on average.
• Table of Contents• Index• Reviews• Examples• Reader Reviews• Errata• AcademicSQL TuningBy