We can't end a chapter on query optimization without looking at some common tricks that can increase performance of some queries. While these are all rather specific, you may find techniques that can be applied in other circumstances.
Sometimes MySQL doesn't optimize a seemingly simple query the way you'd expect. A good example of this behavior occurred in a database used to track historical stock prices. There are two tables involved: SymbolHistory and Symbols.
As far as we're concerned, the Symbols table contains two important fields: Id and Symbol. The Id is an auto_increment primary key. Here's the PriceHistory table:
mysql> DESCRIBE PriceHistory; +----------+---------+------+-----+------------+-------+ | Field | Type | Null | Key | Default | Extra | +----------+---------+------+-----+------------+-------+ | SymbolID | int(11) | | PRI | 0 | | | Date | date | | PRI | 0000-00-00 | | | Open | float | | | 0 | | | High | float | | | 0 | | | Low | float | | | 0 | | | Close | float | | | 0 | | | Volume | float | | | 0 | | +----------+---------+------+-----+------------+-------+ 8 rows in set (0.01 sec)
It has a two-part index on (SymbolID, Date).
The Symbols table maps stock tickers to numeric identifiers. It also contains various other bits of metadata about each security. The PriceHistory table contains the historical price data. One of the most common queries run against the data is, "Show me all closing prices for a given stock sorted from newest to oldest."
To fetch the price history for IBM, the query looks like this:
mysql> EXPLAIN SELECT date_format(Date,'%Y%m%d') as Day, Close -> FROM Symbols, PriceHistory -> WHERE Symbols.ID=PriceHistory.SymbolID AND Symbols.Symbol = 'ibm' -> ORDER BY Date DESC \G *************************** 1. row *************************** table: Symbols type: const possible_keys: PRIMARY,Symbols_SymbolIDX key: Symbols_SymbolIDX key_len: 20 ref: const rows: 1 Extra: Using filesort *************************** 2. row *************************** table: PriceHistory type: ref possible_keys: PriceHistory_IDX key: PriceHistory_IDX key_len: 4 ref: const rows: 471 Extra: Using where 2 rows in set (0.01 sec)
Notice the Using filesort in the EXPLAIN output, which means MySQL will need to sort all the records based on the date. It turns out that the Date column is in the index, but MySQL can't use it directly for sorting because it's not the first part of a composite index. The result is a second pass over the rows to return them in the correct order. That sorting process can be slow when the query is run hundreds of times each minute on a large variety of stocks, some of which have thousands of records.
To improve the performance, we need to arrange it so that MySQL can query the PriceHistory and use the index on the Date column. The easiest way to do so is to break it up into two queries using a temporary variable, just like we did earlier to work around the lack of subselects:
mysql> SELECT @sid := Id FROM Symbols WHERE Symbol = 'ibm'; +------------+ | @sid := Id | +------------+ | 459378 | +------------+ 1 row in set (0.02 sec) mysql> EXPLAIN SELECT date_format(Date,'%Y%m%d') as Day, Close -> FROM PriceHistory WHERE SymbolID = @sid ORDER BY Date DESC \G *************************** 1. row *************************** table: PriceHistory type: ref possible_keys: PriceHistory_IDX key: PriceHistory_IDX key_len: 4 ref: const rows: 7234 Extra: Using where 1 row in set (0.00 sec)
An improvement like this can often mean the difference between a CPU-bound server handling 200 queries per second and a partially idle server handling 700 queries per second. The overhead associated with performing two queries instead of one is still usually less than that extra sorting pass.
Earlier we used a query like this to demonstrate that MySQL handles the situation efficiently:
mysql> EXPLAIN SELECT COUNT(*) FROM Headline -> WHERE ExpireTime >= 1112201600 AND Id <= 5000000 \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: Headline type: range possible_keys: PRIMARY,ExpireTime key: ExpireTime key_len: 4 ref: NULL rows: 12009 Extra: Using where 1 row in set (0.00 sec)
In this example, MySQL uses the ExpireTime index to fetch a set of rows. It then applies the rest of the WHERE clause to eliminate those rows with ID values less than or equal to 5,000,000.
But what if the AND is changed to an OR condition, and we change it from a COUNT(*) to something a bit more meaningful?
mysql> EXPLAIN SELECT * FROM Headline -> WHERE ExpireTime >= 1012201600 OR Id <= 5000000 -> ORDER BY ExpireTime ASC LIMIT 10\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: Headline type: ALL possible_keys: PRIMARY,ExpireTime key: NULL key_len: NULL ref: NULL rows: 302116 Extra: Using where 1 row in set (0.00 sec)
Uh oh. MySQL has decided to perform a full table scan. Actually executing the full query (rather than just explaining it) takes almost three seconds. Let's think about why MySQL made this choice.
We know that MySQL will use only one index per table per query, and the Headline table has an index on Id as well as one on ExpireTime. So why didn't it pick either one?
No matter which index MySQL selects, it has to perform a full table scan to satisfy the other condition. Queries using OR conditions prevent MySQL from easily eliminating candidate rows. So rather than use one index to find some of the rows and then perform the table scan, MySQL decides that it's faster to simply use a table scan. This is slated to be fixed in MySQL 5.0.
In a well-normalized database, queries like the previous one tend not be very common. But when they do occur, they can be real performance killers. Luckily we can sometimes rewrite them using a UNION.
To do this, we'll break the query into two queries that can each use a single index. Then we'll merge and sort the results. The result looks like this:
(SELECT * FROM Headline WHERE ExpireTime >= 1081020749 ORDER BY ExpireTime ASC LIMIT 10) UNION (SELECT * FROM Headline WHERE Id <= 50000 ORDER BY ExpireTime ASC LIMIT 10) ORDER BY ExpireTime ASC LIMIT 10
The first query should be able to use the ExpireTime index while the second one uses the Id index. We must make sure to ask for the total number of rows desired (10) in both queries. The outer ORDER BY and LIMIT clauses will take care of the final sorting and counting.
It turns out that the UNION-based query runs in 0.02 seconds. That's far faster than the query it replaces. Just to make sure we understand what MySQL does, let's explain it:
mysql> EXPLAIN (SELECT * FROM Headline WHERE ExpireTime >= 1081020749 -> ORDER BY ExpireTime ASC LIMIT 10) -> UNION -> (SELECT * FROM Headline WHERE Id <= 50000 -> ORDER BY ExpireTime ASC LIMIT 10) -> ORDER BY ExpireTime ASC LIMIT 10 \G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: Headline type: range possible_keys: ExpireTime key: ExpireTime key_len: 4 ref: NULL rows: 40306 Extra: Using where *************************** 2. row *************************** id: 2 select_type: UNION table: Headline type: range possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: NULL rows: 1 Extra: Using where; Using filesort 2 rows in set (0.00 sec)
Not bad at all. The second query needs a file sort operation, but at least it will use an index to locate all the rows.