6.4 Techniques
With
all the basic theory and recommendations covered,
it's time to get down to business. When you notice
your server is slow, what can you do about it? How do you locate the
bottlenecks? What tools are available? What's the
thought process?
The first step is to identify the
type of bottleneck at the operating-system level. Using standard
operating system tools, try to determine which of the
server's resources are being taxed. Using
top,
vmstat,
or the Windows Task Manager, check the
machine's CPU utilization. If it's
at or near 100%, it's obviously CPU-bound. Use
top to verify which processes are responsible
for the bulk of the CPU utilization. (If you don't
have much experience with operating-system performance tools, consult
a qualified system administrator.)
If MySQL is indeed consuming lots of CPU time, there are several
techniques you can employ in an attempt to reduce the CPU load.
See Section 6.4.2 later in
this chapter. If the processes using the bulk of the CPU time
aren't mysqld, you clearly have
to solve a problem unrelated to MySQL. Perhaps it's
a runaway process or simply something that should be moved to another
machine. Either way, it's not a MySQL issue, so the
problem is "solved" from our point
of view.
If the CPU is very busy but there doesn't appear to
be any obvious process or group of processes using a large amount of
CPU time, look at the division between system and user time. If
there's an unusually high amount of time being spent
on system (kernel) tasks, that may be a sign of
a MySQL configuration problem or something completely unrelated. See
Section 6.4.4 later in this chapter
for an example of why MySQL might be working the kernel too hard.
If the CPU is relatively idle because it's
frequently waiting for the disks, see Section 6.4.1. You'll
know this because of the higher than normal numbers you see with
vmstat and/or iostat. If
the CPU is waiting on disk I/O because of swapping activity, however,
go to Section 6.4.3.
6.4.1 Solving I/O Bottlenecks
Disk (I/O) bottlenecks
tend to be the most common MySQL performance problem.
They're typically caused by inefficient
queriesmeaning that MySQL has to read too many rows to locate
the data you're interested in. Usually that means
your queries aren't using an index, or
they're using an index that's not
terribly effective for this particular query. Before going much
further, be sure you've reviewed Chapter 5.
Diagnosing a query that's not using an index is
relatively easy. If you've enabled the slow query
log (see Section 5.3
in Chapter 5) and set
log-long-format, MySQL automatically logs any
query that doesn't use an index. You really need to
start with that query: use EXPLAIN and do simple
benchmarks when you have more than one way to write a given query.
After you've looked at any slow queries and fixed
them, the next things to look at are more subtle issues. In some
cases, queries do use an index and run relatively fast, so MySQL
never considers them to be slow, but it's actually
the wrong index from a performance point of view. There may be an
alternative index MySQL can use to further decrease the I/O required.
6.4.1.1 Wrong index
Finding queries that use
the wrong index can be more of a challenge. It requires an intimate
understanding of your data and the queries being run against it. A
real-world example may help to illustrate how subtle the problem can
be.
Jeremy uses the mod_log_sql Apache module to
record all his web site hits into a MyISAM table named
access_jeremy_zawodny_com. The table is roughly
1.3 GB in size, contains over 6 million records, and looks like this:
+------------------+----------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+----------------------+------+-----+---------+-------+
| agent | varchar(255) | YES | MUL | NULL | |
| bytes_sent | int(10) unsigned | YES | | NULL | |
| child_pid | smallint(5) unsigned | YES | | NULL | |
| cookie | varchar(255) | YES | | NULL | |
| request_file | varchar(255) | YES | | NULL | |
| referer | varchar(255) | YES | | NULL | |
| remote_host | varchar(50) | YES | MUL | NULL | |
| remote_logname | varchar(50) | YES | | NULL | |
| remote_user | varchar(50) | YES | | NULL | |
| request_duration | smallint(5) unsigned | YES | | NULL | |
| request_line | varchar(255) | YES | | NULL | |
| request_method | varchar(6) | YES | | NULL | |
| request_protocol | varchar(10) | YES | | NULL | |
| request_time | varchar(28) | YES | | NULL | |
| request_uri | varchar(255) | YES | MUL | NULL | |
| server_port | smallint(5) unsigned | YES | | NULL | |
| ssl_cipher | varchar(25) | YES | | NULL | |
| ssl_keysize | smallint(5) unsigned | YES | | NULL | |
| ssl_maxkeysize | smallint(5) unsigned | YES | | NULL | |
| status | smallint(5) unsigned | YES | | NULL | |
| time_stamp | int(10) unsigned | YES | MUL | NULL | |
| virtual_host | varchar(50) | YES | | NULL | |
+------------------+----------------------+------+-----+---------+-------+
There are separate indexes on four columns: agent,
time_stamp, request_uri, and
remote_host. The intention is to provide an
efficient way to produce statistics based on time, user agent
(browser), the document fetched (request_uri), or
the client (remote_host). Notice the indexes on
each of those columns.
Most queries ran very quickly, but one particular query was
problematic. It seemed to run longer than expected. After repeated
execution and watching vmstat output, it became
clear that a lot of time was spent waiting on the disk. The query
attempts to find out which documents a given client has requested
during a particular time rangeusually a single day. It is run
once for every client that requested anything in the past day. The
request looks like this:
select request_uri from access_jeremy_zawodny_com
where remote_host = '24.69.255.236'
and time_stamp >= 1056782930
and time_stamp <= 1056869330
order by time_stamp asc
Running the query through EXPLAIN proved to be
quite interesting:
mysql> explain select request_uri from access_jeremy_zawodny_com
-> where remote_host = '24.69.255.236'
-> and time_stamp >= 1056782930
-> and time_stamp <= 1056869330
-> order by time_stamp asc \G
*************************** 1. row ***************************
table: access_jeremy_zawodny_com
type: ref
possible_keys: time_stamp,remote_host
key: remote_host
key_len: 6
ref: const
rows: 4902
Extra: Using where; Using filesort
1 row in set (0.00 sec)
MySQL chose to use the index on remote_host. But
it doesn't always make that choice. Sometimes it
decides to use the index on time_stamp.
Here's an example:
mysql> explain select request_uri from access_jeremy_zawodny_com
-> where remote_host = '67.121.154.34'
-> and time_stamp >= 1056782930
-> and time_stamp <= 1056869330
-> order by time_stamp asc \G
*************************** 1. row ***************************
table: access_jeremy_zawodny_com
type: range
possible_keys: time_stamp,remote_host
key: time_stamp
key_len: 5
ref: NULL
rows: 20631
Extra: Using where
1 row in set (0.01 sec)
The only difference between those two queries is the IP address
we're looking for. In each case,
MySQL's query optimizer estimates the number of rows
it will need to read to satisfy the query using each possible index.
In the first example, it decides that there are fewer records with a
remote_host of 24.69.255.236
than there are records in the specified 24-hour time range. In the
second example, it does just the opposite, deciding the time range
will result in fewer rows to read.
By experimenting with various IP addresses, it
doesn't take long to find one for which MySQL makes
the wrong choice. It chooses the remote_host index
when using the time_stamp index is actually
fastereven though the remote_host requires
reading the fewest rows.[4] How is that possible?
[4] Using a USE
INDEX specification in the query, you can test the
performance of either index.
The underlying assumption is that all rows cost roughly the same
amount of time to read. But this is a case in which
that's not always true. Consider how the data will
be stored in this MyISAM table. Apache is logging requests to the
table all the time and has been doing so for over a year. Rows are
never removed, so the data is already sorted by timestamp in the
table and on disk (assuming minimal fragmentation).
Once you have a nontrivial amount of information in a table like
this, the rules change a bit. If we assume that the records for a
given IP address are evenly distributed among the millions of
records, it's clear that using the
remote_host index may result
in many more disk seeks. And since disk seeks are slower than reading
consecutive blocks from disk, it follows that MySQL may be doing less
work (evaluating fewer rows) but the disk is doing more
workusing precious seek time that may slow down other queries
too.
In logging applications when you're frequently
querying based on a time range as well as another indexed field, this
problem is quite common and has no good generalizable solution. If
you have some insight into your data and can add it to the software
that writes the queries, that can help a lot. The software could be
configured to tell MySQL which index to use. For example, if your
software knows that a given IP address shows up only very
infrequently recently, it can force MySQL to use the
time_stamp range:
SELECT ... USE_INDEX(time_stamp) ...
It's not the ideal solution, but it is quite
effective when used appropriately.
6.4.1.2 Temporary tables
Another problem that
doesn't show up in the slow query log is an
excessive use of disk-based temporary tables. In the output of
EXPLAIN, you'll often see
Using temporary. It indicates
that MySQL must create a temporary table to complete the query.
However, it doesn't tell you whether that temporary
table will be in memory or on disk. That's
controlled by the size of the table and MySQL's
tmp_table_size variable.
If the space required to build the temporary table is less than or
equal to tmp_table_size, MySQL keeps it in memory
rather than incur the overhead and time required to write the data to
disk and read it again. However, if the space required exceeds
tmp_table_size, MySQL creates a disk-based table
in its tmpdir directory (often
/tmp on Unix systems.) The default
tmp_table_size size is 32 MB.
To find out how often that happens, compare the relative sizes of the
Created_tmp_tables and
Created_tmp_disk_tables counters:
mysql> SHOW STATUS LIKE 'Created_tmp_%';
+-------------------------+-------+
| Variable_name | Value |
+-------------------------+-------+
| Created_tmp_disk_tables | 18 |
| Created_tmp_tables | 203 |
| Created_tmp_files | 0 |
+-------------------------+-------+
If you create a lot of disk-based temporary tables, increase the size
of tmp_table_size if you can do so safely. Keep in
mind that setting the value too high may result in excessive swapping
or MySQL running out of memory if too many threads attempt to
allocate in-memory temporary tables at the same time. Otherwise, make
sure that tmpdir points to a very fast disk
that's not already doing lots of I/O.
As a last resort, consider using a tmpfs (or ramdisk, or mdmfs, or
whatever your OS calls memory-backed filesystems) and setting
$TMPDIR to point there when starting MySQL.
6.4.1.3 Caching
If
your queries are already optimized and using the most efficient
indexes, it's still possible to run into I/O
bottlenecks at some point. Simply running too many queries, no matter
how efficient they are, can become too much for the disk(s) to keep
up with. If so, it's time to consider caching.
The easiest thing to do is make sure you're using
the MySQL query cache. Available since MySQL 4.0, the query cache
keeps the results of frequently executed SELECTs
in memory so that MySQL doesn't need to perform any
disk I/O at all. See Section 5.4.4 in Chapter 5 for more information.
Taking things a step further, you might consider application-level
caching. If there's data that
doesn't change frequently at all, query for it once
in a while and store it in memory or on local disk until you requery
for it.
6.4.1.4 Spread the load
If
you've already covered the causes listed earlier and
implemented the suggestions, it's likely that you
need to spread the I/O load more effectively. As described earlier,
installing disks with faster RPMs and lower seek times may help.
Using RAID (especially RAID 0, RAID 5, or RAID 10) will spread the
work across multiple disks, possibly eliminating or reducing the
bottleneck.
Another option, if you have multiple disks and can't
easily configure RAID, is to attempt to balance the disk I/O
manually. Spend some time with
iostat
or
systat
(depending on your OS) to discover where the bulk of the I/O is
going. If you have all your MySQL data on a single disk, you can try
moving pieces to another disk. If the majority of activity is focused
on a small group of tables, consider moving them to a separate disk.
Another approach is to separate predominantly random I/O from that
which is mostly serial. Store logs such as the binary logs,
replication relay logs, and InnoDB transaction logs, on a separate
disk from the actual data files. It's ultimately a
game of trial and error. As with benchmarking, keep a close eye on
the numbers and try not to change too many things at once.
Finally, replication is always an option. If you've
simply outgrown the capacity of a single machine,
it's often the least disruptive solution. See Chapter 7 to learn all about replication.
6.4.2 Solving CPU Bottlenecks
CPU bottlenecks in MySQL
can be difficult to track down. Unlike some database servers, MySQL
currently doesn't provide per-query statistics about
the amount of time spent actually doing work versus waiting for disk
I/O to complete.
Luckily it doesn't have to be a complete guessing
game. If you see a query in the slow query log and suspect that it
may be CPU-bound, simply benchmark it. Pull out a copy of MySQL
super-smack, and run it a few thousand times in a row. Then, in
another window, watch top,
vmstat, or your favorite system monitoring tool.
If the CPU quickly hits 100% utilization even with a relatively low
number of concurrent queries, the query is very likely CPU-bound.
If you find yourself staring at a very large list of slow queries,
how do you decide which ones to start analyzing? Easy: look for those
that examine a large number of rows (thousands, tens of thousands, or
more), and focus on those that use any of MySQL's
built-in data-manipulation functions. Common suspects are those that:
Format or compare dates
Encrypt data or compute hashes
Perform complex comparisons, such as regular expressions
You'll often find that something as simple as
computing an MD5 hash over millions of values per hour is using too
much CPU time. By moving the logic into the application servers that
query that database, you'll free up CPU time for
work that only MySQL can do efficiently.
If you can't easily ask MySQL to do less work by
moving logic into the application layer, you always have the option
of throwing hardware at the problem. You can do this in one of two
ways. You might simply upgrade the CPUs in your server or add more
CPUs if there's room. Alternatively, you may find it
less expensive and more scalable to add new servers, replicate the
data to them, and spread the load among them.
There's nothing wrong with using
Moore's Law to your advantage once in a while.
High CPU utilization with MyISAM tables isn't
always bad. It may mean that you are doing
queries on tables that have been entirely cached in the operating
system's cache. This may or may not be a bad thing.
It's certainly better than reading from disk, but
each time MySQL has to ask the OS for a block of data,
that's CPU time that could be better spent
processing the rest of the query. Moving to InnoDB or BDB tables lets
MySQL cache table data itself, so it doesn't have to
ask the OS for records.
6.4.3 Solving Memory Bottlenecks
Tuning memory usage on MySQL
servers can be a delicate balancing act. As explained earlier, MySQL
has some global memory buffers in addition to a number of per-thread
buffers. The trick is to balance the performance gains that come from
having large global buffers against the need to service a particular
number of concurrent users. At a minimum, you should have enough
memory available to handle MySQL's global buffers
plus the per-thread buffers multiplied by the maximum number of
concurrent connections you will use.
Expressed mathematically, that is:
min_memory_needed = global_buffers + (thread_buffers * max_connections)
where thread_buffers includes the following:
sort_buffermyisam_sort_bufferread_bufferjoin_bufferread_rnd_buffer
and global_buffers includes:
key_bufferinnodb_buffer_poolinnodb_log_buffer
innodb_additional_mem_poolnet_buffer
We say that's the minimum memory required because
ideally you'd like some left over for the operating
system itself to use. In the case of MyISAM tables,
"spare" memory will often be put to
use caching records from MyISAM data (.MYD)
files.
In addition to any memory the threads may allocate in the process of
handling queries, the threads themselves also require a bit of memory
simply to exist. The thread_stack variable
controls this overhead. On most platforms, 192 KB is the default
value.[5]
[5] If you happen to be using LinuxThreads on
FreeBSD, the value is hardcoded in the LinuxThreads library. Changing
MySQL's thread_stack setting will
have no effect. You must recompile the library to change the stack
size.
A likely problem is typified by an all-too-common scenario. Imagine
you have a server with 1 GB of memory hosting a mix of MyISAM and
InnoDB tablesmostly MyISAM. To get the most bang for your
buck, you configure a 512-MB key_buffer after
watching the key efficiency in mytop (see Appendix B) and a 256-MB
innodb_buffer_pool after checking the buffer pool
and memory statistics from SHOW INNODB
STATUS (see Appendix A). That
leaves 256 MB that is used to cache data files at the operating
system level as well as the per-thread buffers that are allocated on
an as-needed basis. The MySQL server handles a relatively small
number of concurrent users, maybe 20-50 most of the time, and the
per-thread buffer settings are all left at their default sizes.
Things work very well until a few new applications are built that
also use this MySQL server. These new applications need a significant
number of concurrent connections. Instead of 20-50 connections on
average, the server is handling 300-400. When this happens, the odds
of several connections needing to allocate a per-thread buffer (such
as the sort_buffer) at the same time increase
quite a bit.
This can lead to a particularly nasty series of events. If a large
number of threads need to allocate additional memory,
it's probably because the server is handling a heavy
query load. That can cause MySQL to allocate so much memory that the
operating system begins swapping, which causes performance to degrade
further, which means that each query takes longer to complete. With
queries running more slowly, the odds of more threads needing memory
increases. It's a vicious spiral.
The only solution is to restore balance between the
system's memory and MySQL's memory
needs. That means doing one of the following.
Add more memory
Decrease max_connections
Decrease some of the per-thread buffer sizes
Be proactive. Monitor memory use on your servers. Do the math to
ensure that in the worst case (hitting
max_connections and each thread allocating
additional memory), you'll still have a bit of
breathing room.
6.4.4 Solving Kernel Bottlenecks
Though
it's not common, you may find that MySQL
doesn't appear to be using an overwhelming amount of
CPU time, yet the machine is rather busy. There's
little idle CPU. Upon looking at it more closely, you find that quite
a bit of the time is spent in
"system" rather than
"user" or
"idle." That's
likely a sign that MySQL is doing something unusual to exercise the
kernelusually creating and destroying threads.
This happened at Yahoo! during the launch of a new web site. In
September 2002, engineers were scrambling to create a September 11th
memorial web site known as
remember.yahoo.com.[6] On it, anyone could create a memorial
"tile" by selecting a graphic and
adding a customized message. The tile was then viewable by anyone
visiting the site. To get the job done as quickly as possible, it was
constructed using standard open source tools, including FreeBSD,
Apache, PHP, and MySQL
[6] The entire
site was conceived, designed, built, and launched in roughly two
weeks using the spare time of handful of Yahoo's
engineers.
The architecture was relatively straightforward, but
we'll simplify it a bit to focus on the main point.
A group of frontend web servers was configured to connect to a slave
server by way of a hardware load balancer. Using the slave
connection, the server could pull the information necessary to
display the tiles. When a visitor created a tile, however, the web
server needed to connect to the master to insert several records. The
master was a beefier machine: dual 1.2-GHz CPUs, 2 GB of RAM, and a
SCSI hardware RAID 5 disk array.
At its peak, there were roughly 25-30 web servers that needed to work
with the master. Each server was configured to run roughly 30-40
Apache processes. That meant the master would need to support over
1,000 concurrent clients. Knowing that could tie up substantial
resources on the master, the designers opted for a simplified
approach. Unfortunately, the web application (written in PHP) was
configured to use persistent connections. So, to keep connection
numbers down on the master, the wait_timeout was
set very lowto roughly 10 seconds.
By and large, it worked. Idle connections were dropped after 10
seconds. The number of connections on the master remained below 200,
leaving lots of resources free. But there was a problem: the CPUs in
the master were quite busy. Most of the time there was less than 10%
idle time, and nearly 50% of the CPU time was being spent on system
(rather than user) tasks.
After an hour or so of head-scratching, looking at system logs and
the output of SHOW STATUS, a
light finally flickered on in Jeremy's head. The
value of Threads_created was very large and
increasing at an alarming rate. The kernel was so busy creating and
destroying threads that it was eating into MySQL's
ability to use the CPUs effectively.
With that realization, the solution was easy. Increasing the
thread_cache from its default value of 0 to
roughly 150 resulted in an instant improvement. The system CPU time
dropped to roughly 10%, thus freeing up quite a bit of CPU time for
MySQL to use. As it turns out, MySQL didn't need it
all, so the machine ended up with 20% idle timebreathing
room.