15.9 Tuning MySQL
To
conclude this chapter, we show you selected techniques for improving
the performance of your databases, queries, and MySQL server. We
focus on how to choose and design indexes, tips for querying and
database design, how to tune the server parameters, and how to use
MySQL''s query cache.
15.9.1 Index Design
As discussed in Chapter 5, each table should have a PRIMARY
KEY definition as part of its CREATE
TABLE statement. A primary key is an attribute (or set of
attributes) that uniquely identifies a row in a table. Storing two
rows with the same primary key isn''t permitted and
an attempt to INSERT duplicate primary keys
produces an error (unless you use the IGNORE
modifier).
The attribute values of the primary key are stored in an
index to allow fast access to a row using the
primary key values. The default index type for a MyISAM table type is
fast for queries that find a specific row, a range of rows, for joins
between tables, grouping data, ordering data, and finding minimum and
maximum values. Indexes don''t provide any speed
improvement for retrieving all the rows in a table or for other query
types.
As discussed briefly in Chapter 5, indexes are
also useful for fast access to rows by values other than those in the
primary key. For example, in the customer table,
you might define an index by adding it using:
ALTER TABLE customer ADD INDEX namecity (surname,firstname,city);
After you define this index, some queries that select a particular
customer through a WHERE clause automatically use
it. Consider an example:
SELECT * FROM customer WHERE surname = ''Marzalla''
AND firstname = ''Dimitria'' AND city = ''St Albans'';
This query can use the new index to quickly locate the row that
matches the search criteria. Without the index, the server must scan
all the rows in the customer table and compare
each row to the WHERE clause. This might be quite
slow and certainly requires significantly more disk activity than the
index-based approach (assuming the table has more than a few rows).
A particular feature of database servers is that they develop a query
evaluation strategy and optimize it without any interaction from the
user or programmer. If an index is available, and it makes sense to
use it in the context of a query, the server does this automatically.
All you need to do is identify which
queries are common, and make an index
available for those common queries by adding the
KEY
clause to the
CREATE
TABLE statement or using ALTER
TABLE on an existing table.
If you''ve created the namecity
index, and you want to check that MySQL will use it for the previous
query, you can do so with the EXPLAIN statement:
EXPLAIN SELECT * FROM customer WHERE surname = ''Marzalla''
AND firstname = ''Dimitria'' AND city = ''St Albans'';
This reports that:
+----+-------------+----------+------+---------------+----------+---------+----
---------------+------+--------+
| id | select_type | table | type | possible_keys | key | key_len | ref
| rows | Extra |
+----+-------------+----------+------+---------------+----------+---------+----
---------------+------+--------+
| 1 | SIMPLE | customer | ref | namecity | namecity | 153 | const,
const,const | 1 | Using where |
+----+-------------+----------+------+---------------+----------+---------+----
---------------+------+--------+
1 row in set (0.06 sec)
You can see that the namecity index is listed as a
possible choice in the possible_keys column and,
as expected, it''s the index that''ll
be used to evaluate the query as shown in the key
column. The EXPLAIN
statement is a useful diagnostic tool
for understanding and optimizing complex queries but we
don''t discuss it in detail here; you can find out
more about it in Section 5.2.1 of the MySQL manual.
Careful index design is important. The namecity
index we have defined can also speed queries other than those that
supply a complete surname,
firstname, and city. For
example, consider a query:
SELECT * FROM customer WHERE surname = ''LaTrobe'' AND
firstname = ''Anthony'';
This query can also use the index namecity,
because the index permits access to rows in sorted order first by
surname, then firstname, and
then city. With this sorting, all
"LaTrobe, Anthony" index entries
are clustered together in the index. Indeed, the index can also be
used for the query:
SELECT * FROM customer WHERE surname LIKE ''Mar%'';
Similarly, all surnames beginning with
"Mar" are clustered together in the
index. You can use EXPLAIN to check that the index
is being used.
However, the index can''t be used for a query such
as:
SELECT * FROM customer WHERE firstname = ''Dimitria'' AND
city = ''St Albans'';
The index can''t be used because the leftmost
attribute named in the index, surname,
isn''t part of the WHERE clause.
In this case, all rows in the customer table
must be scanned and the query is much slower (again assuming there
are more than a few rows in the customer table,
and assuming there is no other index that could be used).
|
There are other cases in which an index can''t be
used, such as when a query contains an OR that
isn''t on an indexed attribute:
SELECT * FROM customer WHERE surname = ''Marzalla'' OR zipcode = "3001";
Again, the customer table must be completely
scanned, because the second condition,
zipcode="3001", requires all rows to be retrieved
as there is no index available on the attribute
zipcode. Also, the attributes that are combined
together with OR must be the leftmost attributes
in the index; otherwise the query requires a complete scan of the
customer table. The following example requires a
complete scan:
SELECT * FROM customer WHERE firstname = ''Dimitria'' OR
surname = ''Marzalla'';
If all the attributes in the index are used in all the queries, to
optimize index size, the leftmost attribute in the
KEY clause should be the attribute with the
highest number of duplicate entries.
Because indexes speed up queries, why wouldn''t you
create indexes on all the attributes you can possibly search on? The
answer is that while indexes are fast for searching, they consume
space and require updates each time rows are added or deleted, or key
attributes are changed. So, if a database is largely static,
additional indexes have low overheads, but if a database changes
frequently, each additional index slows down the update process
significantly. In either case, indexes consume additional space on
disk and in memory, and unnecessary indexes should be avoided.
One way to reduce the size of an index and speed updates is to create
an index on a prefix of an attribute. Our namecity
index uses considerable space: for each row in the
customer table, an index entry is up to 150
characters in length because it is created from the combined values
of the surname, firstname, and
city attributes.[2]
[2] This
isn''t the space actually required by an index entry,
because the data is compressed for storage. However, even with
compression, the fewer characters that are indexed, the more compact
is the representation, the more space is saved, and (depending on the
usability of the index) the faster searching and updates are.
To reduce space, you can define the index as:
ALTER TABLE customer ADD INDEX namecity
(surname(10),firstname(3),city(2));
This uses only the first 10 characters of surname,
3 of firstname, and 2 of city
to distinguish index entries. This is quite reasonable, because 10
characters from a surname distinguishes between most surnames, and
the addition of a few characters from a first name and the prefix of
their city should be sufficient to uniquely identify almost all
customers. Having a smaller index with less information can also mean
that queries are actually faster, because more index information can
be retrieved from disk per second, more of the index can fit into
spare memory, and disk retrieval speed is almost always the
bottleneck in query performance.
The space saving is significant with a reduced index. A new index
entry requires only 15 characters, a savings of up to 135 characters,
so index insertions, deletions, and modifications are now likely to
be much faster. Note that for TEXT and
BLOB attribute types, a prefix must be taken when
indexing, because indexing the entire attribute is impractical and
isn''t permitted by the MySQL server.
15.9.2 Design Tips
Careful
index design is one technique that improves speed and reduces
resource requirements. However, design of your database, tables,
attributes, and queries is also important. As discussed previously,
accessing a hard disk is slow and is usually the bottleneck in
database server performance. Therefore, most techniques described in
this section improve performance by minimizing disk space and disk
use.
Reducing disk space requirements improves both disk seek and read
performance. Disk read performance is improved because less data is
required to be transferred, while seek performance is improved
because the disk head has to move less on average when randomly
accessing a smaller file than when accessing a larger file.
Here are some simple ways to improve database server performance:
Carefully choose attribute types and lengths. Where possible, use
small variants such as SMALLINT or
MEDIUMINT rather than the regular choice
INT. When using fixed-length attributes, such as
CHAR, specify a length that is as short as
possible.
Use fixed-length attributes: try to avoid the types
VARCHAR, BLOB, and
TEXT. While fixed-length text attributes may waste
space, scanning fixed-length rows is much faster than scanning
variable-length rows (and, as discussed in Section 15.6. the MyISAM table type
adjusts its structure for speed when variable-length types
aren''t used).
MySQL can''t join tables using an index if the
attributes are different types or (in some cases) have different
lengths. You should use ALTER TABLE to change the
attribute types and lengths so that they match. However, MySQL can
join char and varchar
attributes, as long as they have the same declared length. You can
check what''s happening with the
EXPLAIN statement.
Create a statistics table if aggregate functions such as
COUNT( ) or SUM( ) are
frequently used in queries that contain WHERE
clauses and are on large tables. A statistics table usually stores
only one row that is manually updated with the aggregate values of
another table.
For example, suppose you want to create a statistics table that
tracks the number of rows in the customer table.
You would create it with the following statement:
CREATE TABLE custCount (custCount int(5));
It doesn''t need a primary key because
there''s only going to be one row of data in the
table. You would then initialize the statistics table to the current
count of customers using:
INSERT INTO custCount SELECT count(*) FROM customer;
From then on, you''d use the new table to check the
count of customers. For example:
SELECT custCount from custCount;
If a row is deleted from the customer table, you
need to update the statistics table:
UPDATE custCount SET custCount = custCount - 1;
Similarly, if a new row is added, you add one to the counter. For
large tables, this technique is often faster than calculating
aggregate functions with the slow built-in functions that require
complete processing of all rows.
If you''re inserting large numbers of rows, list the
values in one (or few) insert statements as this is much faster to
process. For example, convert:
INSERT INTO table (1, "cat");
INSERT INTO table (2, "dog");
into:
INSERT INTO table (1, "cat"), (2, "dog");
If large numbers of rows are deleted from a table, or a table
containing variable-length attributes is frequently modified, disk
space may be wasted. MySQL doesn''t usually remove
deleted or modified data; it only marks the location as being no
longer in use. Wasted space can affect access speed.
To reorganize a table, use the OPTIMIZE
TABLE command discussed in Section 15.3.3. It should be used
periodically (perhaps once per month).
MySQL uses statistics about a table to make decisions about how to
optimize each query. You can update these statistics by running:
ANALYZE TABLE customer;
You don''t need to do this often.
Use the Heap table type discussed in "Table
Types" for small tables that are searched only for
exact matches using = or
<=>.
Section 5.2.13 of the MySQL manual includes other excellent ideas for
simple performance improvement.
15.9.3 Server Tuning Tips
Comprehensive database tuning is a complex topic that fills many
books. We include in this section only a few practical ideas to help
you to begin to improve the performance of a database system. You can
refer to the books in Appendix G for more
information and also read Section 5.5 of the MySQL manual.
MySQL includes is the
mysqladmin
tool for database administration. Details
of the system setup can be found by running the following command in
a Unix shell:
% /usr/local/mysql/bin/mysqladmin -uroot -ppassword variables
In Microsoft Windows, type the following into the Run dialog
that''s accessible through the Start menu:
"C:\Program Files\EasyPHP1-7\mysql\bin\mysqladmin.exe" -uroot
-ppassword variables
Both commands assume you''ve followed our
installation instructions in Appendix A
through Appendix C.
This shows, in part, the following selected system parameters:
join_buffer current value: 131072
key_buffer current value: 8388600
net_buffer_length current value: 16384
record_buffer current value: 131072
sort_buffer current value: 2097144
table_cache current value: 64
Some of the important parameters are those that impact disk use.
MySQL has several main-memory buffer parameters that control how much
data is kept in memory for processing. These include:
The record_buffer for scanning all rows in a table
The sort_buffer for ORDER BY
and GROUP BY operations
The key_buffer for storing indexes in main memory
The join_buffer for joins that
don''t use indexes
In general, the larger these buffers, the more data from disk is
cached or stored in memory and the fewer disk
accesses are required. However, if the sum of these parameters is
near to exceeding the size of the memory installed in the server, the
operating system will start to swap data between disk and memory, and
the MySQL server will be slow. In any case, careful experimentation
based on the application is likely to improve server performance.
Section 5.5.2 of the MySQL manual suggests parameter settings when
starting the MySQL server. First, for machines with more than 256 MB
of free memory, large tables in the database, and a moderate number
of users, start your MySQL in Unix with:
% /usr/local/mysql/bin/mysqld_safe -O key_buffer=64M -O table_cache=256 \
-O sort_buffer=4M -O read_buffer_size=1M &
The following setting is appropriate for an application such as the
online winestore, because many users are expected, the queries are
largely index-based, and the database is small:
mysqld_safe -O key_buffer=512k -O sort_buffer=16k \
-O table_cache=32 -O read_buffer_size=8k -O net_buffer_length=1K &
There are two other parameters used in this example that
we''ve not discussed. The
table_cache parameter manages the maximum number
of open tables per user connection, while the
net_buffer parameter sets the minimum size of the
network query buffer in which incoming queries are kept before they
are executed.
The SHOW STATUS and SHOW
VARIABLES commands that are described at the beginning of
this chapter can be used to report on MySQL''s use
and behavior. SHOW VARIABLES does the same thing
as the mysqladmin command at the beginning of
this section. SHOW STATUS gives a brief
point-in-time summary of the server status and can help find more
about the number of user connections, queries, and table use. This is
useful input into changing the startup parameters
we''ve just described.
15.9.4 Query Caching
MySQL 4 features an optional query
cache. When you activate it, the results of queries are stored in a
memory buffer. If an identical query arrives later, the results are
returned from the cache rather than the query being rerun. This is an
excellent feature if your application runs many identical queries and
your database doesn''t change too often. Typically,
this makes it an ideal tool for web database applications, and we
recommend you use it for your applications; if you followed our Unix
installation instructions in Appendix A
through Appendix C, you''ve already
enabled the query cache with default parameters.
Consider an example. In our online winestore, the following query is
executed every time any user visits the homepage. The query finds the
latest three wines that have been stocked at the winestore and have
been reviewed by a wine writer:
SELECT wi.winery_name, w.year, w.wine_name, w.wine_id, w.description
FROM wine w, winery wi, inventory i WHERE w.winery_id = wi.winery_id
AND w.wine_id = i.wine_id AND w.description IS NOT NULL
GROUP BY w.wine_id ORDER BY i.date_added DESC LIMIT 3;
The query isn''t fast to run: it uses three tables,
two join conditions, a conditional check for a description, a
GROUP BY clause, an ORDER BY
clause, and the LIMIT modifier. However,
it''s an ideal candidate for query caching: the
homepage is popular and new wines, wine reviews, and inventory are
infrequently added. From simple experiments with our online winestore
from the first edition of this book, we''ve found
that adding query caching makes visiting the homepage almost three
times faster.
Query caching has several features:
It''s configurable so that you can control which
queries are cached when you write the queries (more later in this
section).
It can be tuned: you can control how much memory is used in total,
the maximum size of a result set in the cache, and the size of memory
blocks that are allocated in the cache (more on this next).
It''s sensitive to data or table structure changes:
if they change, any query that uses them is automatically flushed
from the cache.
However, there are some situations in which it isn''t
useful and you need to be careful:
If you rarely execute the same query, most queries are first checked
against the cache and then executed anyway. This adds an overhead to
most queries.
If your data changes frequently, the cached queries have to be
frequently flushed from the cache, and this slows querying down. If
any row changes in a table, a query using that table is flushed (even
if it doesn''t access the row).
To use the cache, a query must be identical byte for byte with a
query that''s in the cache. So, for example,
SELECT * FROM customer and SELECT * FROM
CUSTOMER are treated as different queries, and the latter
won''t find the former in the cache. Even additional
whitespace renders two queries different.
Queries that use functions that should return different results
don''t use the cache. Most of these are obvious:
system time functions, system date functions, locking functions, and
so on. A complete list is in Section 6.9.1 of the MySQL manual.
15.9.4.1 Configuring query caching
Query caching is off by default when you install MySQL 4.1. To turn
it on, you need to edit your MySQL configuration file that you
installed when following the installation procedure in Appendix A through Appendix C. In
a Unix environment, edit the file /etc/my.cnf,
and in Microsoft Windows edit the my.ini file in
C:\winnt for Windows 2000/2003/NT or
C:\windows for Windows XP. Find the section that
beings with the heading:
# The MySQL server
[mysqld]
Add the following statements to the end of that section:
query_cache_size = 16M
query_cache_type = 1
query_cache_min_res_unit = 4K
query_cache_limit = 1M
You need to add the first parameter
query_cache_size that defines how much memory the
cache uses (we''ve decided on 16 MB): the default is
0, which disables caching. The remaining three statements are
optional, but we''ve included them with their default
settings anyway. The query_cache_type parameter
defines whether caching is off (0), on (1), or only used when you ask
for it (2); we discuss how to control which queries are cached in the
next section. The third parameter
query_cache_min_res_unit defines the minimum
memory block size for a cached result set, and the default of 4 KB.
works well. The last parameter query_cache_limit
defines the maximum size of a result set that can be cached, and the
1 MB default is a sensible choice.
After you''ve made the changes, save the file, and
restart your MySQL using the method described for your platform in
Appendix A through Appendix C. Alternatively, reboot your machine.
You now have caching enabled. If you repeat a query such as
SELECT * FROM customer twice or
more, you can check that the cache is in action by using the
SHOW STATUS command from the
command interpreter. This outputs, in part, the following:
| Qcache_queries_in_cache | 1 |
| Qcache_inserts | 1 |
| Qcache_hits | 3 |
| Qcache_lowmem_prunes | 0 |
| Qcache_not_cached | 0 |
| Qcache_free_memory | 16709128 |
| Qcache_free_blocks | 1 |
| Qcache_total_blocks | 4 |
In our example, you can see that our system has just started up:
there''s one query in the cache, only one query has
ever been inserted, it''s been re-run three times,
and no queries that couldn''t be cached have ever
been run. There''s also plenty of memory free.
15.9.4.2 Controlling query caching
After you''ve got caching turned on, you can control
whether an individual query is cached or not. If you
don''t prevent caching, and the
query_cache_type parameter is set to its default
of 1, all queries are cached (with the exception of those queries
that it doesn''t make sense to cache, as discussed
previously). If the query_cache_type parameter is
set to 2, only those queries you ask to be cached will be cached.
Here''s an example of how to explicitly cache a query:
SELECT SQL_CACHE * FROM customer;
Here''s an example of how not to cache a query:
SELECT SQL_NO_CACHE * FROM customer;
Caching only works with SELECT queries. It
doesn''t make sense to cache dynamic
DELETE, INSERT, or
UPDATE queries (or their variants such as
TRUNCATE).