SQL Tuning [Electronic resources] نسخه متنی

اینجــــا یک کتابخانه دیجیتالی است

با بیش از 100000 منبع الکترونیکی رایگان به زبان فارسی ، عربی و انگلیسی

SQL Tuning [Electronic resources] - نسخه متنی

| نمايش فراداده ، افزودن یک نقد و بررسی
افزودن به کتابخانه شخصی
ارسال به دوستان
جستجو در متن کتاب
بیشتر
تنظیمات قلم

فونت

اندازه قلم

+ - پیش فرض

حالت نمایش

روز نیمروز شب
جستجو در لغت نامه
بیشتر
لیست موضوعات
توضیحات
افزودن یادداشت جدید










2.7 Joins





Single-table queries quickly pale as
interesting tuning problems. Choices are few enough that even trial
and error will quickly lead you to the best execution plan.
Multitable queries yield much more interesting problems. To tune
multitable queries, you need to understand the different join types
and the tradeoffs between the different join execution methods.


2.7.1 Join Types


Let's begin by trying to understand clearly what a
multitable query means. First, consider how databases interpret
joins, the operations that combine rows from
multiple tables into the result a query demands.
Let's begin with the simplest imaginable multitable
query:

SELECT ... FROM Orders, Order_Details;

With no WHERE clause at all, the database has no
instructions on how to combine rows from these two large tables, and
it does the logically simplest thing: it returns
every possible combination of rows from the tables. If you
had 1,000,000 orders and 5,000,000 order details, you would get (if
you could wait long enough) 5,000,000,000,000 rows back from the
query! This is the rarely used and even more rarely useful

Cartesian join. The
result, every combination of elements from two or more sets, is known
as the Cartesian product. From a business
perspective, you would have no interest in combining data from orders
and order details that had no relationship to each other. When you
find Cartesian joins, they are almost always a mistake.


The most common, but still very rare, exception to this rule is when
one of the tables returns only a single row. In that case, you can
view a Cartesian join query as a more sensible combination of results
from a single-row query appended, for convenience, to results of a
logically independent multirow query.


2.7.1.1 Inner joins



Any given order-processing application
would surely need details pertaining to given orders, so you
aren't too likely to see a Cartesian join. Instead,
you would more likely see a join condition that tells the database
how the tables relate:

SELECT ... FROM Orders O, Order_Details D WHERE O.Order_ID=D.Order_ID;

Or, shown in the newer-style notation:

SELECT ... FROM Orders O 
INNER JOIN Order_Details D ON O.Order_ID=D.Order_ID;

Logically, you can still think of this as a Cartesian product with a
filtered result: "Give me all the combinations of
orders and order details, but discard those combinations that fail to
share the same order ID." Such a join is referred to
as an inner join. Even in the worst cases,
databases virtually always come up with a better way to find the
requested rows than this brute-force method of Cartesian product
first, followed by discards based on unmet join conditions. This is
fortunate indeed, since a many-way Cartesian join between several
large tables could take years or even eons to complete. Most joins,
like the one shown, consist of an equality between a foreign key in
some detail table matched to a primary (unique) key in some master
table, but any condition at all that mentions two or (rarely) more
tables is a join condition.

From a procedural perspective, the database has several choices for
how best to join the tables in the preceding queries:

Start with the master table and find the matching details.

Start with the detail table and look up the matching master rows.

Go to both sets of rows independently (but not with a Cartesian
product) and somehow merge those rows that match up on the joined
column values.


While they yield identical functional results, these alternatives
often yield radically different performance, so this book will
explain at length how to choose the best alternative.


2.7.1.2 Outer joins



A common
alternative to the inner join is the outer join.
An outer join is easiest to describe in procedural terms: start with
rows from a driving table and find matching rows, where possible,
from an outer-joined table, but create an artificial all-nulls
matching row from the outer-joined table whenever the database finds
no physical matching row. For example, consider the old-style Oracle
query:

SELECT ..., D.Department_Name FROM Employees E, Departments D
WHERE E.Department_ID=D.Department_ID(+);

In the newer style notation, on any database, you query the same
result with:

SELECT ..., D.Department_Name 
FROM Employees E
LEFT OUTER JOIN Departments D
ON E.Department_ID=D.Department_ID;

These queries return data about all employees, including data about
their departments. However, when employees have no matching
department, the query returns the employee data anyway, filling in
inapplicable selected fields, such as
D.Department_Name, with nulls in the returned
result. From the perspective of tuning, the main implication of an
outer join is that it eliminates a path to the data that drives from
the outer-joined table to the other table. For the example, the
database cannot start from the Departments table
and find the matching rows in the Employees table,
since the database needs data for all employees, not just the ones
with matching departments. Later, I will show that this restriction
on outer joins, limiting the join order, is not normally important,
because you would not normally want to join in the forbidden
order.


2.7.2 Join Execution Methods


The join types determine which results a
query requires but do not specify how a database should execute those
queries. When tuning SQL, you usually take as a given which results a
query requires, but you should control the execution method to force
good performance. To choose execution methods well, you need to
understand how they work.


2.7.2.1 Nested-loops joins



The simplest method of performing an
efficient join between two or more tables is the
nested-loops join,
illustrated in Figure 2-7.


Figure 2-7. Nested-loops joins


The query execution begins with what amounts to a single-table query
of the
driving
table (the first table the database reads), using only the
conditions that refer solely to that table. Think of the leftmost box
in Figure 2-7, with the attached crank, as a
machine to perform this single-table query. It separates
uninteresting rows (destined for the wastebasket at lower left) from
interesting rows (rows that satisfy the single-table query
conditions) from the driving table. Since the query is a join, the
database does not stop here. Instead, it passes the result rows from
that first box to the next box. The job of the second box is to take
rows from the first box, one at a time, find matching rows from the
first joined-to table, then again discard rows that do not meet query
conditions on tables so far touched, while passing on rows that meet
these conditions. The database usually performs this matching step in
a nested-loops join by indexed lookups on the join key, repeated for
each row from the driving table.

When the join is an inner join, the database discards rows from the
earlier step that fail to find matches in the joined-to table. When
the join is an outer join, the database fills in joined-to table
values with nulls when it fails to find a match, retaining all rows
from the earlier step. This process continues with further boxes to
join each of the rest of the tables in the same way until the query
is finished, leaving a fully joined result that satisfies all query
conditions and joins.

Internally, the database implements this execution plan as a nested
series of loopsthe outermost loop reads rows off the driving
table, the next loop finds matching rows in the first joined table,
and so onwhich is why these are called
nested-loops joins. Each point in the process
needs to know only where it is at that moment and the contents of the
single result row it is building, so this process requires little
memory to execute and no scratch space on disk. This is one of the
reasons that nested-loops plans are robust: they
can deliver huge result sets from huge tables without ever running
out of memory or disk scratch space, as long as you can wait long
enough for the result. As long as you choose the right join order,
nested loops work well for most sensible business queries. They
either deliver the best performance of all join methods, or they are
close enough to the best performance that the robustness advantage of
this join method weighs in its favor, in spite of a minor performance
disadvantage.


When speaking of robustness, I speak only of the join method.
Independent of the join, the query might call for a large sorted
result (for example, with an ORDER BY clause) that
requires a large combination of memory and disk scratch space,
regardless of the join method.


2.7.2.2 Hash joins



Sometimes, the database should access
the joined tables independently and then match rows where it can and
discard unmatchable rows. The database has two ways to do this:
hash joins, which I discuss in this section, and
sort-merge joins, which I
discuss next. Figure 2-8 illustrates a hash join.


Figure 2-8. A hash join


In Figure 2-8, each of the top two boxes with
cranks acts like an independently optimized single-table query. Based
on table and index statistics, the cost-based optimizer[6] estimates which of these two independent tables will
return fewer rows after discarding filtered rows. It chooses to hash
the complete result from that single-table query. In other words, it
performs some randomizing mathematical function on the join key and
uses the result of that function to assign each row to a
hash
bucket. The ideal hash algorithm spreads the rows fairly
evenly between a number of buckets approximating the number of rows.

[6] Oracle's rule-based optimizer will never
choose a hash join, but its cost-based optimizer often will. The
other database vendors do not offer rule-based optimizers.


Since the hash buckets are based on the smaller result set, you can
hope those hash buckets fit entirely in memory, but, if necessary,
the database temporarily dedicates scratch disk space to hold the
buckets. It then executes the larger query (as shown in the
upper-right box in Figure 2-8), returning the
driving rowset. As each row exits this step, the database executes
the same hash function in its join key and uses the hash-function
result to go directly to the corresponding hash bucket for the other
rowset. When it reaches the right hash bucket, the database searches
the tiny list of rows in that bucket to find matches.

Note that there may not be any matches. When the database finds a
match (illustrated in the lower-middle box with a crank), it returns
the result immediately or sends the result on to the next join, if
any. When the database fails to find a match, the database discards
that driving row.

For the driving rowset, a hash join looks much like a nested-loops
join; both require that the database hold just the current row in
memory as it makes a single pass through the rowset. For the smaller,
prehashed rowset, however, the hash join approach is less robust if
the prehashed rowset turns out to be much larger than expected. A
large prehashed rowset could require unexpected disk scratch space,
performing poorly and possibly even running out of space. When you
are fully confident that the smaller rowset is and always will be
small enough to fit in memory, you should sometimes favor the hash
join over a nested-loops join.


2.7.2.3 Sort-merge joins



The sort-merge join, shown in Figure 2-9, also reads the two tables independently, but,
instead of matching rows by hashing, it presorts both rowsets on the
join key and merges the sorted lists. In the simplest implementation,
you can imagine listing the rowsets side-by-side, sorted by the join
keys. The database alternately walks down the two lists, in a single
pass down each list, comparing top rows, discarding rows that fall
earlier in the sort order than the top of the other list, and
returning matching rows. Once the two lists are sorted, the matching
process is fast, but presorting both lists is expensive and
nonrobust, unless you can guarantee that both rowsets are small
enough to fit well within memory.


Figure 2-9. A sort-merge join


When you compare a sort-merge join with a nested-loops alternative,
the sort-merge join has roughly the same potential advantages that
hash joins sometimes have. However, when you compare a sort-merge
join with a hash join, the hash join has all the advantages: it
avoids placing the larger rowset into memory or scratch disk space,
with essentially no downside cost. Therefore, when you have a choice
between a sort-merge join and a hash join, never choose a sort-merge
join.


2.7.2.4 Join methods summary


In summary, you should usually choose nested loops for both
performance and robustness. You will occasionally benefit by choosing
hash joins, when independent access of the two tables saves
significantly on performance while carrying a small enough robustness
risk. Never choose sort-merge joins, unless you would prefer a hash
join but find that method unavailable (for example, as with
Oracle's rule-based optimizer). In later chapters, I
will treat nested-loops joins as the default choice. The rules that
cover when you can benefit by overriding that default are complex and
depend on the material in the later chapters. The main purpose of
this section was to explain the internal database operations behind
that later discussion.


/ 110