5.2 Optimizer Features and Oddities
When testing queries, always remember
to use realistic data. A common source of problems with MySQL is the
query optimizer's handling of test data. It often
does surprising things. If you don't know what
it's doing and why (and it rarely tells you why),
you may spend a lot of time tracking down a problem that really
isn't there. Or, worse yet, you may embarrass
yourself asking about it on the MySQL mailing list, only to learn
that you've created the problem all on your own.In general, MySQL uses an index when it is
reasonably confident that doing so is more
efficient than not doing so. This leads to false negatives
during testing. The false negative tends to occur in the two
situations that we'll now investigate.
5.2.1 Too Little Diversity
Even if you have a lot of data (thousands of
rows or more), MySQL may choose to ignore your indexes some of the
time if your data doesn't have sufficient diversity.
Why might that happen? Imagine you have a table that contains
historical climate data for most world cities:
CREATE TABLE weather
(
city VARCHAR(100) NOT NULL,
high_temp TINYINT NOT NULL,
low_temp TINYINT NOT NULL,
the_date DATE NOT NULL,
INDEX (city),
INDEX (the_date),
)
Rather than loading all two million records, you load two years worth
of data (1980 and 1981) to test. After some testing, you find that
queries that need to access many of the records are using full table
scans rather than the the_date index. For example,
to find the average high temperature in 1980, you might write
something like this:
SELECT AVG(high_temp) FROM weather
WHERE the_date BETWEEN '1980-01-01' AND '1980-12-31';
Having data from only 1980 and 1981 loaded, that query needs to
examine 50% of the rows in the weather table. In
such a case, MySQL decides that it is faster to simply scan the
entire table.How does it know? When you cross a certain threshold, it is slower to
locate rows using an index than to read them sequentially. For MySQL,
the cutoff point is roughly 30%. The number is chosen by the MySQL
developers based on their extensive experience (and knowledge of the
code) and is subject to change from release to release. The actual
number is specific to each storage engine: InnoDB has a different
threshold than MyISAM tables, and so forth.The main reason index performance is worse in these circumstances
goes all the way down to the hardware: disk seek performance. Indexes
are always sorted, but the data on disk is not. Using an index means
accessing the rows in index-sorted order rather than in the order
they reside on disk. The end result is more time spent moving around
the disk and less time reading data. Sequential reads are always
going to be faster than random seeks. If you're
lucky enough to be using a RAM disk, most of the overhead vanishes.You can draw two conclusions from this knowledge. First, if a table
really is going to remain very small, you may want to leave off the
indexes. (Unique indexes are an exception to this rule. Without them
you can't enforce a unique constraint on the table.)
The second conclusion merely reinforces what we said
earlieralways use a representative data set for your testing.
It should be representative both in terms of size and diversity.One special case that must be mentioned is that of index-only
queries. If you happen to write a query that requires only columns
contained within a single index, you'll be
pleasantly surprised. MySQL is smart enough to realize that all the
required data is present in the index, so it doesn't
bother to fetch any of the rows from disk. This, obviously, provides
you with excellent performance.
5.2.2 Index-Based Ordering
One of MySQL's weak
points is sorting. It can usually fetch 15,000 rows in a heartbeat,
but if you happen to need them in any particular order it may take
quite a bit more time.[2][2] Of course, performance is
always relative. we've seen queries that MySQL
answered in 20 ms take 200 ms after adding an ORDER
BY clause. For many applications, 200 ms is still quite
fast.
The problem is really two-fold. First, sorting is simply more work,
and work takes time. Aside from adding a faster CPU,
there's no avoiding that fact. If
you're not sorting on a computed field, your first
instinct is likely to add an index on the sorting column.
Unfortunately, that rarely helps. As you'll remember
from Chapter 4, MySQL uses at most one index
per table per query. Odds are that you're already
using an index on the table in question, so MySQL will not touch your
new index.The solution to the second problem also goes back to Chapter 4. Add the sorting column as a second part in
the existing index. By doing so you get the best of both worlds.
You'll have an index MySQL can use to quickly locate
rows (just as before) and an index that provides
order to the data. That removes the need for MySQL to make a sorting
pass over the results.Going back to the weather example, to speed up queries like this:
SELECT * FROM weather WHERE city = 'Toledo' ORDER BY the_date DESC
you'd change the index on city to
an index on (city, the_date):
ALTER TABLE weather DROP INDEX city, ADD INDEX (city, the_date)
Remember that the order of columns is significant. The leftmost
prefix rule dictates that city must appear first
in the index to be used for that query.Taking things a step further, you might then be tempted to remove the
single index on the_date. Don't
do it unless you're sure there are no queries using
the_date in their WHERE clause.
A query based on the_date can't
be satisfied using the new index on (city,
the_date) because the_date
isn't a leftmost prefix in the index.
5.2.3 Impossible Queries
MySQL performs a basic logical
analysis of the WHERE clause of every query. In
doing so, it can often detect when you've asked for
something that doesn't make any sense:
SELECT * FROM mytable WHERE id < 5000 and id > 30000
If it finds an impossible WHERE clause, it returns
zero records, sparing the expense of running an otherwise pointless
and possibly expensive query.If you suspect that MySQL has optimized away an impossible
WHERE clause, simply ask it to
EXPLAIN the query. If you see a result like this:
mysql> SELECT * FROM mytable WHERE id < 5000 and id > 30000
+-----------------------------------------------------+
| Comment |
+-----------------------------------------------------+
| Impossible WHERE noticed after reading const tables |
+-----------------------------------------------------+
1 row in set (0.00 sec)
you'll know what it was thinking.Aside from making a simple typo, it's unlikely that
you'll run many queries like that. However, if
you're building an application on top of MySQL and
happen to make a typo or a serious logic error in the code, you can
end up running lots of pointless queries before tracking down the
problem. It's good to know that MySQL
doesn't waste much time dealing with your illogical
queries.
5.2.4 Full-Text Instead of LIKE
Chapter 4,
it's clear that full-text indexes are much faster
than using a LIKE clause in your queries to search
for a word or phrase. In the vast majority of cases, you should use a
full-text index to tackle these types of problems.However, there are times when this can be problematic. The query
optimizer doesn't look very closely at full-text
indexes when deciding which index to use for a table. In fact, if
there's a usable full-text index, the optimizer will
always prefer it regardless of how many rows it actually eliminates
from the result set. Hopefully this will be fixed in a future version
of MySQL.