4.4 Controlling Plans on SQL Server
There
are three main steps involved in tuning on SQL Server:Provide the optimizer with good statistics about the tables and
indexes, so it can calculate the costs of alternatives accurately.Modify the query to prevent execution plans that you do not want,
mainly using methods specific to SQL Server.Force a simple execution plan with FORCEPLAN when
necessary.
4.4.1 SQL Server Optimization Prerequisites
Proving that a little knowledge is a dangerous thing, cost-based
optimizers often do a terrible job if they do not have statistics on
all the tables and indexes involved in the query. It is therefore
imperative to maintain statistics on tables and indexes reliably;
this includes regenerating statistics anytime table volumes change
much or anytime tables or indexes are rebuilt. It is safest to
regenerate statistics periodically, during times when load is
relatively quiet, nightly or at least weekly. Run the following from
Query Analyzer, then cut and paste the
resulting UPDATE STATISTICS commands into the
query window and run them as well:
-- file called updateall.sql
-- update your whole database
SELECT 'UPDATE STATISTICS ', name
FROM sysobjects
WHERE type = 'U'
Often, queries include conditions on highly skewed distributions,
such as conditions on special types, codes, or flags, when these
columns have only a few values. SQL Server automatically maintains
statistics on the distribution of indexed column values, enabling SQL
Server to estimate selectivities automatically, even when indexed
columns have skewed distributions.Occasionally, it is useful to help the SQL Server estimate the
selectivity of condition with a skewed distribution even when the
distribution applies to a nonindexed column. In such a case, you need
to specially request data on that column. For example, to request a
statistics group named Eflag on the nonindexed
column Exempt_Flag of the
Employees table, run:
CREATE STATISTICS EFlag on Employees(Exempt_Flag)
As an example of a case in which such skewed distributions might
apply, consider an Orders table in which you have
a Status_Code column with three possible values:
'CL' for closed (i.e.,
fulfilled) orders, 'CA' for
cancelled orders, and 'OP'
for open orders. Most orders, by far, would be
fulfilled once the application has been running for a few months,
causing a steady rise in 'CL' values. A steady,
significant fraction of orders would end up cancelled, so the value
'CA' would also eventually point to a large list
of orders. However, as long as the business keeps up with incoming
orders, the number of open orders would remain moderate and steady,
even as data accumulates for years. Quite early, a condition that
specified Status_Code='OP' would be selective
enough to prefer joining to the corresponding table early, even if
Status_Code is not indexed, and it is important to
enable the optimizer to realize this fact, preferably without a lot
of manual tuning. This requires that your SQL actually mention the
specific value that applies to the condition, rather than use a
generic stored procedure that only fills in the value of the constant
after the parse, at execution time.
4.4.2 Modifying the Query
You should usually tune SQL Server with
hints. Hints generally go in either the FROM
clause, when they apply to a specific table access, or in the SQL
Server OPTION( ) clause at the very end of a
query. These are the most useful hints:WITH (INDEX(
<Index_Name>))
Immediately following a table alias in a FROM
clause, this hint instructs SQL Server to use the specified index to
access that table alias. The older alternative syntax
INDEX=<Index_Name>
is also supported, but it might be dropped in the future, so I
don't recommend it. Even more obsolete and dangerous
is the still-supported method of naming the internal object ID that
corresponds to the desired index. Naming the index you want with the
ID is horribly unreliable, because the index will get a new ID if
anyone ever drops it and recreates it, or if the application moves to
a new SQL Server database.
WITH (INDEX(0))
Immediately following a table alias in a FROM
clause, this hint instructs SQL Server to use a full table scan to
reach that table alias.
WITH (NOLOCK)
Immediately following a table alias in a FROM
clause, this hint instructs SQL Server to read the specified table
alias without requiring read locks or otherwise enforcing a
consistent read. Read locks on SQL Server can create a bottleneck
when combined with heavy update activity on a table. This hint avoids
such a bottleneck, potentially at the cost of a consistent view of
the data as of a single moment in time.
LOOP
and HASH
These two different hints can each immediately precede the
JOIN keyword in the FROM
clause, instructing SQL Server to perform the specified join with the
specified join method. These hints require the new-style join syntax
with the JOIN keyword in the
FROM clause. The presence of even a single hint of
this type also forces all joins to take place in the same order the
aliases are listed in the FROM clause.
OPTION(LOOP JOIN)
This hint goes at the end of a query and forces all joins to follow
nested loops.
OPTION(FORCE ORDER)
This hint goes at the end of a query and forces all joins to take
place in the same order the aliases are listed in the
FROM clause.
OPTION(FAST 1)
This hint simply instructs SQL Server to attempt to reach the first
returned rows as quickly as possible, which generally favors a
nested-loops execution plan. Its effect is much like the
OPTION(LOOP JOIN) hint, although in theory SQL
Server might recognize that no execution plan could reach the first
rows quickly in a query with an explicit ORDER BY,
nullifying any effect from OPTION(FAST 1). The
OPTION(FAST 1) hint is the equivalent of the
FIRST_ROWS hint on Oracle and the
OPTIMIZE FOR 1 ROW hint on DB2.
These hints can be combined. You can place multiple hints within a
single WITH clause, separating them with
commasfor example, WITH
(INDEX(Employee_First_Name), NOLOCK). Multiple
hints in a single OPTION clause also are separated
by commasfor example, OPTION(LOOP JOIN, FORCE
ORDER). Together, these hints give full control of the join
order, the join methods, and the table access methods.
4.4.3 Hint Examples
I'll demonstrate tuning with hints on a couple of
queries. If you choose a robust all-nested-loops plan that drives
from the employee last name to the other tables in optimum order,
using the primary keys to reach the other tables, this
query's hints force the desired plan:
SELECT E.First_Name, E.Last_Name, E.Salary, LE.Description,
M.First_Name, M.Last_Name, LM.Description
FROM Employees E WITH (INDEX(Employee_Last_Name))
INNER JOIN Locations LE WITH (INDEX(Location_PKey))
ON E.Location_ID=LE.Location_ID
INNER JOIN Employees M WITH (INDEX(Employee_PKey))
ON E.Manager_ID=M.Employee_ID
INNER JOIN Locations LM WITH (INDEX(Location_PKey))
ON M.Location_ID=LM.Location_ID
WHERE E.Last_Name = 'Johnson'
AND LE.Description='Dallas'
OPTION(LOOP JOIN, FORCE ORDER)
SET SHOWPLAN_TEXT ON (as described in Chapter 3) generates the following results when you run
this query from SQL Server Query Analyzer:
StmtText
-----------------------------------------------------------------
|--Bookmark Lookup(...(...[Locations] AS [LM]))
|--Nested Loops(Inner Join)
|--Bookmark Lookup(...(...[Employees] AS [M]))
| |--Nested Loops(Inner Join)
| |--Filter(WHERE:([LE].[Description]='Dallas'))
| | |--Bookmark Lookup(...(...[Locations] AS [LE]))
| | |--Nested Loops(Inner Join)
| | |--Bookmark Lookup(...(...[Employees] AS [E]))
| | | |--Index Seek(...(...
(wrapped line) [Employees].[Employee_Last_Name]
(wrapped line) AS [E]), SEEK:([E].[Last_Name]='Johnson') ORDERED)
| | |--Index Seek(...(...[Locations].[Location_PKey]
(wrapped line) AS [LE]), SEEK:([LE].[Location_ID]=[E].[Location_ID]) ORDERED)
| |--Index Seek(...(...[Employees].[Employee_PKey]
(wrapped line) AS [M]), SEEK:([M].[Employee_ID]=[E].[Manager_ID]) ORDERED)
|--Index Seek(..(...[Locations].[Location_PKey]
(wrapped line) AS [LM]), SEEK:([LM].[Location_ID]=[M].[Location_ID]) ORDERED)
(12 row(s) affected)
If you don't want to specify all nested loops, you
might need the join HASH and
LOOP hints, as shown in the following alternative
to the last query:
SELECT E.First_Name, E.Last_Name, E.Salary, LE.Description,
M.First_Name, M.Last_Name, LM.Description
FROM Employees E WITH (INDEX(Employee_Last_Name))
INNER HASH JOIN Locations LE WITH (INDEX(Location_Description))
ON E.Location_ID=LE.Location_ID
INNER LOOP JOIN Employees M WITH (INDEX(Employee_PKey))
ON E.Manager_ID=M.Employee_ID
INNER LOOP JOIN Locations LM WITH (INDEX(Location_PKey))
ON M.Location_ID=LM.Location_ID
WHERE E.Last_Name = 'Johnson'
AND LE.Description='Dallas'
The preceding query delivers the following execution plan, triggered
by SET SHOWPLAN_TEXT ON:
StmtText
-------------------------------------------------------------
|--Bookmark Lookup(...(...[Locations] AS [LM]))
|--Nested Loops(Inner Join)
|--Bookmark Lookup(...(...[Employees] AS [M]))
| |--Nested Loops(Inner Join)
| |--Hash Match(Inner Join...
(wrapped line) ([E].[Location_ID])=([LE].[Location_ID])...)
| | |--Bookmark Lookup(...(...[Employees] AS [E]))
| | | |--Index Seek(...(...[Employees].[Employee_Last_Name]
(wrapped line) AS [E]), SEEK:([E].[Last_Name]='Johnson') ORDERED)
| | |--Bookmark Lookup(...(...[Locations] AS [LE]))
| | |--Index Seek(...(...[Locations].[Location_Description]
(wrapped line) AS [LE]), SEEK:([LE].[Description]='Dallas') ORDERED)
| |--Index Seek(...(...[Employees].[Employee_PKey]
(wrapped line) AS [M]), SEEK:([M].[Employee_ID]=[E].[Manager_ID]) ORDERED)
|--Index Seek(...(...[Locations].[Location_PKey]
(wrapped line) AS [LM]), SEEK:([LM].[Location_ID]=[M].[Location_ID]) ORDERED)
(11 row(s) affected)
There are two basic extremes involved in tuning with hints such as
those in this example:Use as little direction as possible to get the execution plan you
want, or at least to get close enough to the plan you want for
reasonable performance. This approach reasons that SQL Server has
more information than you have, and it should be left free to adapt
to changing data distributions and take advantage of improvements in
SQL Server with future releases. By leaving SQL Server the maximum
degrees of freedom, you maximize its power to optimize well for you
in the future. However, until you try, you won't
know how much direction the SQL Server will need if it did not get
the plan right in the first place, so this approach is likely to be
iterative, involving adding one hint at a time until SQL Server
delivers a good plan.If you did not get the plan you wanted from SQL Server automatically,
assume the database has made bad assumptions that will propagate to
distort all of its calculations. Therefore, leave it with very little
freedom, specifying essentially the whole plan you want.
If you are confident in your chosen execution plan, as you should be
if you apply the methods I describe later in this book, there is
little reason to hold back from fully specifying that plan. I have
yet to find a case in which a well-chosen, robust execution plan
needed to evolve to handle new data distributions or new database
features. On the other hand, it is easy for SQL with a partially
restricting set of hints to go wrong, especially if some table or
index loses its statistics. When SQL Server chooses incorrectly, the
error that made the database choose incorrectly is likely to
propagate over the entire plan. However, the OPTION(FAST
1) hint is the sort of instruction that can be useful even
when SQL Server has perfect information, simply specifying that the
time to reach the first row is more important than the time to reach
the last row.
4.4.4 Using FORCEPLAN
An older method to tune on both
Microsoft SQL Server and Sybase is the FORCEPLAN
option. You execute this option with a standalone SQL statement:
SET FORCEPLAN ON
This option affects all SQL for that connection until you execute
this statement:
SET FORCEPLAN OFF
When FORCEPLAN is ON, the
database is instructed to perform only the simplest optimization on
the SQL it sees. It generally uses nested-loops execution plans that
drive through indexes and join tables in the same order you list them
in the FROM clause. When this is the sort of plan
you want, SET FORCEPLAN can be ideal, not only
forcing the plan simply, but even saving parse time that would
otherwise be wasted considering a much wider range of plans,
especially for joins of many tables. It is a blunt-edged sword, so to
speak, so only use it when you know the
FROM-clause join order is the correct join order
and you want nested loops.
• Table of Contents• Index• Reviews• Examples• Reader Reviews• Errata• AcademicSQL TuningBy