3.3 Benchmarking Tools
In
this chapter we'll introduce three useful
benchmarking tools:The MySQL Benchmark Suite, which is useful for making comparisons
between different database engines or different installations of one
database engine. It isn't meant to benchmark your
site-specific data or needs.MySQL super-smack, a stress-testing tool.MyBench, a tool developed in Perl by one of the authors. It is
another stress-testing tool that is easier to customize and extend
than super-smack.
3.3.1 The MySQL Benchmark Suite
The MySQL distribution comes with a rather
comprehensive set of generic tests that have been bundled together so
you can run them as a group and examine the results. The tests will
do little to help you figure out whether a configuration change will
speed up your application. But they're very helpful
when used as a high-level benchmark, meaning they provide a good
overall indication of how well one server performs relative to
another.You can also run the tests individually if you'd
like compare a subset of the results from several servers. If
you're mainly interested in
UPDATE speed, run one of the
UPDATE-intensive tests a few times on each server.The benchmark suite can be used to test non-MySQL servers as well. According
to the README, PostgreSQL, Solid, and mSQL have
been tested. This may be helpful if you're trying to
choose between MySQL and PostgreSQL. All the benchmark code is
relatively generic Perl using the DBI and Benchmark modules. If
needed, you can add support for nearly any database server that has a
DBI driver (Oracle, Sybase, Informix, DB2, etc.). If you do so, be
sure to look at the bench-init.pl for any global
options you may need to add or change.By running the benchmarks against several different servers,
you'll get an idea of how much faster one server is
than another. The tests are largely CPU-bound, but there are portions
of the test that demand a lot of disk I/O (for short times).
You'll likely find that the 2.4-GHz CPU
doesn't necessarily make MySQL run twice as fast as
the 1.2-GHz CPU.The benchmark suite will not help you test the
benefits of multi-CPU machines because the benchmark process is
completely serialized. It executes one query after another, so MySQL
will not benefit from the addition of a second CPU. To test that,
you'll need to use MySQL super-smack or a home-grown
solution. Both are covered in the following sections.To run the tests, use the run-all-tests script
located in the sql-bench directory. Be sure to
read the README in that directory. It provides a
complete list of the command-line options you can use.
$ cd sql-bench
sql-bench$ ./run-all-tests --server=mysql --user=root --log --fast
Test finished. You can find the result in:
output/RUN-mysql_fast-Linux_2.4.18_686_smp_i686
The benchmarks may take quite a while to run, depending on your
hardware and configuration. On a dual 933-MHz Pentium 3, it took over
an hour to execute the tests using MySQL 4.0.13.
While
it's running, however, you can watch the progress.
The --log flag causes results from each test to be
logged in a subdirectory named output. Each file
contains a series of timings for the various operations in each
benchmark test. Here's a small sampling, slightly
reformatted for printing:
sql-bench/output$ tail -5 select-mysql_fast-Linux_2.4.18_686_smp_i686
Time for count_distinct_group_on_key (1000:6000):
34 wallclock secs ( 0.20 usr 0.08 sys + 0.00 cusr 0.00 csys = 0.28 CPU)
Time for count_distinct_group_on_key_parts (1000:100000):
34 wallclock secs ( 0.57 usr 0.27 sys + 0.00 cusr 0.00 csys = 0.84 CPU)
Time for count_distinct_group (1000:100000):
34 wallclock secs ( 0.59 usr 0.20 sys + 0.00 cusr 0.00 csys = 0.79 CPU)
Time for count_distinct_big (100:1000000):
8 wallclock secs ( 4.22 usr 2.20 sys + 0.00 cusr 0.00 csys = 6.42 CPU)
Total time:
868 wallclock secs (33.24 usr 9.55 sys + 0.00 cusr 0.00 csys = 42.79 CPU)
As you can see, the count_distinct_group_on_key
(1000:6000) test took 34
"wallclock" seconds to execute.
That's the total amount of time the client took to
run the test. The other values (usr,
sys, cursr,
csys) that added up to 0.28 seconds constitute the
overhead for this test. That's how much of the time
was spent running the benchmark client code rather than waiting for
the MySQL server's response. This means that the
figure we care abouthow much time was tied up by things
outside the client's controltotalled 33.72
seconds.It's also worth noting that you can run the tests
individually if you need to. Rather than rerun the entire suite, you
may decide to focus on the
insert test. By
doing so, you see a bit more detail than was in the summarized files
left in the output directory:
sql-bench$ ./test-insert
Testing server 'MySQL 4.0.13 log' at 2003-05-18 11:02:39
Testing the speed of inserting data into 1 table and do some selects on it.
The tests are done with a table that has 100000 rows.
Generating random keys
Creating tables
Inserting 100000 rows in order
Inserting 100000 rows in reverse order
Inserting 100000 rows in random order
Time for insert (300000):
42 wallclock secs ( 7.91 usr 5.03 sys + 0.00 cusr 0.00 csys = 12.94 CPU)
Testing insert of duplicates
Time for insert_duplicates (100000):
16 wallclock secs ( 2.28 usr 1.89 sys + 0.00 cusr 0.00 csys = 4.17 CPU)
3.3.2 MySQL super-smack
Developed by
Sasha Pachev,
a former MySQL AB employee, super-smack is a stress-testing tool that
can talk to both MySQL and PostgreSQL. The super-smack tool really
deserves wider recognition, because it's very
powerful. Using a simple configuration file syntax, you can define a
series of tests (a query
barrel) to run against your server along with
the data and tables needed to support the tests. When running the
tests, you control how many concurrent clients will be simulated (one
per thread) and how many iterations of each test the clients will
execute using command-line arguments.Because the tool simulates many simultaneous users, it works very
well for testing multi-CPU servers. And even on single CPU machines,
it allows you to generate more realistic test scenarios as well as
perform stress tests.A typical test with super-smack involves creating one or more large
tables and populating them with various data, chosen from an input
file or generated on the fly. It then proceeds to beat on the created
tables using a series of queries that are defined by the user via a
configuration file. The values used in the queries are selected from
an external file in either random or sequential order.As you'll see, using MySQL super-smack requires more
work than using the supplied benchmarks. While it will take some time
to get super-smack set up and running the first time,
you'll benefit from having much greater control over
the tests. With a little practice, you can create custom tailored
benchmarks in very little time.You'll first need to download and build super-smack
before you can begin testing; it doesn't come with
MySQL. As
of this writing, the current release is available from http://jeremy.zawodny.com/mysql/super-smack/.
It uses GNU
autoconf, so the installation process is
relatively simple as long as your build tools are reasonably current.
/tmp$ tar -zxf super-smack-1.1.tar.gz
/tmp$ cd super-smack-1.1
/tmp/super-smack-1.1$ ./configure --with-mysql
... lots of configure output ...
/tmp/super-smack-1.1$ make
... lots of compilation output ...
/tmp/super-smack-1.1$ sudo make install
Be sure to read the MANUAL and
TUTORIAL files included in the distribution.
They cover topics that we may notespecially if
you're using a newer version.To get started with super-smack, let's look at the
example benchmarks it includes. In
/usr/share/smacks, you'll find
a small collection of smack files:
/usr/share/smacks$ ls -l
total 8
-rw-r--r-- 1 jzawodn jzawodn 3211 Feb 2 2004 select-key.smack
-rw-r--r-- 1 jzawodn jzawodn 3547 Feb 2 2004 update-select.smack
These files contain the commands necessary to populate a table and
execute a bunch of queries against it. Before diving into the
configuration file, let's give it a quick run.
We'll ask it to simulate 30 concurrent users, each
running 10,000 iterations of the test queries.
/usr/share/smacks$ super-smack update-select.smack 30 10000
Error running query select count(*) from http_auth:Table 'test.http_auth' doesn't exist
Creating table 'http_auth'
Loading data from file '/var/smack-data/words.dat' into table 'http_auth'
Table http_auth is now ready for the test
Query Barrel Report for client smacker
connect: max=49ms min=0ms avg= 14ms from 30 clients
Query_type num_queries max_time min_time q_per_s
select_index 300000 10 0 2726.41
update_index 300000 5 0 2726.41
The test requires a table named http_auth to
operate. Since the table didn't exist, the test used
the data in /var/smack-data/words.dat to
populate the table. Then super-smack ran the tests and produced
results.After the "Query Barrel Report"
line, you can see the performance stats from this benchmark run. (A
query barrel, as you'll see later, is a set of
queries run by super-smack in each iteration.) The first line
provides connection stats, which list the maximum, minimum, and
average connection times for each of the 30 clientsthat is,
how long the client waited for the server when establishing a
connection.[2][2] The super-smack tool uses persistent
connections. Each client connects once and remains connected for the
duration of the test run. You can't use super-smack
to simulate nonpersistent connections.
The remaining lines provide statistics for each type of test defined
in the smack file. For each, you see the number
of times the query was executed (this should always match what you
specified on the command line), the maximum time the query took, the
minimum time the query took, and the number of queries executed per
second.Running with different values (fewer clients),
you'll see the performance was actually higher:
3,306 queries/sec versus 2,726 queries/sec.
/usr/share/smacks$ super-smack update-select.smack 5 10000
Query Barrel Report for client smacker
connect: max=2ms min=1ms avg= 1ms from 5 clients
Query_type num_queries max_time min_time q_per_s
select_index 50000 1 0 3306.66
update_index 50000 1 0 3306.66
That's likely because we ran the super-smack client
on the same machine as MySQL, so the two were competing for CPU time.
In real-world testing, you'd probably have the
client and server separated, and you'd want to run
the same benchmark several times to rule out any anomalies.
3.3.2.1 Preparing test data
Using the
words.dat data as input works in the
http_auth benchmark, but when testing your
applications, you'll need to supply your own data.
There is no one-size-fits-all answer for how to generate your test
data. You have to determine what data to create or extract for use in
the tests. Once you've done that and loaded the data
into MySQL, you need to extract the relevant values into a file that
super-smack can read during testing.For example, if you're testing an online product
catalog in which items will be selected based on their product ID,
you'll need a list of product IDs to use during
testing. For a comprehensive test, use all the product IDs. If you
have millions of products, it may be sufficient to test a subset of
them.In either case, first get a list of the product IDs into a text file
that you can then drop into /var/smack-data/ to
use during the testing. The easiest way to do that to use
MySQL's SELECT
... INTO
OUTFILE construct:
SELECT id INTO OUTFILE "/tmp/product.dat" FROM product
That produces a file containing one product ID per lineperfect
for use with super-smack. If your test requires multiple columns of
data, you can produce a file of quoted comma-separated values:
SELECT id, type INTO OUTFILE "/tmp/product.dat"
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY "\n"
FROM product
super-smack allows you to specify a field delimiter to be used for
input files, as you'll see. Also be sure to copy
your file to /var/smack-data/.
3.3.2.2 Configuration
Having
installed and tested super-smack, let's spend some
time dissecting one of the standard smack files.
Along the way, we'll consider how you might adapt
the file to your own testing needs.
|
language that's loosely based on C or Perl. Each
smack file defines several objects that are used
in the main block of the file: clients, tables,
dictionaries, and queries.
client "admin"
{
user "root";
host "localhost";
db "test";
pass ";
socket "/var/lib/mysql/mysql.sock";
}
The first section defines an admin client using
the root account on localhost's
server and assumes there's no password on the
account.[3] If
you plan to run super-smack on a remote client, be sure to update the
settings appropriately. The socket should be left
empty (or removed) in that case. If you're running
MySQL on a nonstandard port, specify that in the
client section(s):[3] If you don't specify a
password, super-smack does not prompt you for one. We point this out
only because many other MySQL tools prompt you.
port "3307";
Next, define the table and data used for the tests:
table "http_auth"
{
client "admin";
create "create table http_auth
(username char(25) not null primary key,
pass char(25),
uid integer not null,
gid integer not null
)";
min_rows "90000";
data_file "words.dat";
gen_data_file "gen-data -n 90000 -f %12-12s%n,%25-25s,%n,%d";
}
There's a lot going on here. First, we specify that
the table will be created and populated using the
admin user options specified previously. Then we
provide a CREATE TABLE
specification. If the table doesn't already exist,
super-smack creates it. We also specify a minimum number of rows. If
the table exists but doesn't have sufficient rows,
super-smack will drop and recreate the table. Then, if needed, it
will load the data from the words.dat file,
which is expected to live in /var/smack-data.
Finally, if that file doesn't exist, super-smack
uses gen-data (which comes with super-smack) to
create 90,000 rows of random data.The gen-data command isn't
documented, but as you can see, it requires a number of rows
(-n) and a printf-style format
string (-f). Sample output for our command looks
like:
$ gen-data -n 5 -f %12-12s%n,%25-25s,%n,%d
pajgyycklwiv1,qbnvqtcewpwvxpobgpcgwppkw,1,763719779
epqjynjbrpew2,mhvcdpmifuefqdmjblodvlset,2,344858293
fbntssvvmwck3,cfydxkranoqfiuvyhqvtprmpx,3,2125632375
fcwtayvakrxr4,ldaprgacrwsbujrnlxxsxqwse,4,1513050921
jnaixvfvktpf5,htihaukugfiurnnmxnysypsnr,5,1872952907
super-smack loads the output into the table using the
LOAD DATA command.In real-life testing, you probably won't be using
super-smack to populate your tables. Instead, you can simply use a
copy of your real data.Next we have a
dictionary
definition:
dictionary "word"
{
type "rand";
source_type "file";
source "words.dat";
delim ",";
file_size_equiv "45000";
}
A dictionary is
simply a source for words that will later be used when constructing
queries. It's a simple mechanism that gives you
control over which values are used in queries and how they are used.The dictionary type can be one of the following:rand
Values are selected randomly from the list.
seq
Values are used sequentially.
unique
Generate unique values using the same method as
gen-data.
The source_type may be one of the following:file
A file read from disk.
list
A user-supplied list of words, comma-separated.
template
The format to use when type is
unique. For example,
"jzawodn_%07d" generates values composed of
jzawodn_ and a seven-digit number.
The source is either a filename (assumed to be in the
/var/smack-data directory) or a comma-separated
list of quoted values ("one","two","three") when
using a source_type of list.If you use a delimited file, the
delim
option tells super-smack which character separates the input fields
in your source file. The file_size_equiv option is
helpful when you have a very large dictionary. Rather than use every
word, super-smack divides the file size by this number. The result is
then used to skip records in the input.For example, if your file is 100 KB in size and you specify a
file_size_equiv of 10,240, super-smack divides the
two and knows to use only one tenth of the input. It will test using
every tenth value in the source file.Next are two query definitions, one for a series of
SELECT queries followed by an
UPDATE query generator:
query "select_by_username"
{
query "select * from http_auth where username = '$word'";
type "select_index";
has_result_set "y";
parsed "y";
}
query "update_by_username"
{
query "update http_auth set pass='$word' where username = '$word'";
type "update_index";
has_result_set "n";
parsed "y";
}
The queries are relatively simple. If you'd like to
substitute a word from the dictionary in the query, simply use the
$word placeholder and be sure to set
parsed to y; otherwise
super-smack uses your query as is.The type is simply a tag or name for this set of
queries. It is reported by name in the final statistics. The
has_result_set option tells super-smack whether
the query returns data.Next, one more client is defined:
client "smacker"
{
user "test";
pass ";
host "localhost";
db "test";
socket "/var/lib/mysql/mysql.sock";
query_barrel "1 select_by_username 1 update_by_username";
}
Unlike the previous client, this one has a
query_barrel associated with it. The query barrel
defines the order and number of queries the client will run during
each iteration. In this case, we've instructed it to
execute one select_by_username query followed by
one update_by_username query. You can adjust the
numbers to suit your particular testing needs, of course.Finally, we get to the main section of the smack
file. It controls the actual flow.
main
{
smacker.init( );
smacker.set_num_rounds($2);
smacker.create_threads($1);
smacker.connect( );
smacker.unload_query_barrel( );
smacker.collect_threads( );
smacker.disconnect( );
}
One of the first things to notice is that command-line arguments are
available in shell-style numbered variables ($1,
$2, etc.). So if you'd like to
reverse the order of arguments on the command line, you can do so.The code's flow is straightforward. It begins by
initializing the smacker client. Then we set the
number of rounds and create the necessary threads. Each thread then
connects to the server and unloads its barrel of queries, keeping
statistics along the way. The collect_threads
function causes the main thread to wait for the others to complete.
The clients then disconnect, and the statistics are reported.When you look at the setup piece by piece, it's easy
to digest. The same framework works for a wide variety of testing.
The main section rarely changes. And, for very simple tests (such as
a single query), there's no need to define multiple
users. If you are creating a benchmark to simulate a relatively
complex application that requires various username and password
combinations to access all the necessary data,
you'll need to also define them in your
smack file.
3.3.3 MyBench: A Home-Grown Solution
MySQL
super-smack is a great tool, but it's not terribly
extensible unless you want to dive into the C++ code. When you need
custom logic that's not easy to express in
super-smack's configuration, it's
probably time to turn to your favorite scripting language.When Jeremy encountered this problem in back in 2001, he developed a
very simple Perl-based system called MyBench. It handles the details
of spawning clients, gathering and computing statistics, and so on.
The downside is that it's quite a bit heavier on the
client side. You really shouldn't run the benchmark
client on the same machine as MySQL.You can download the code from http://jeremy.zawodny.com/mysql/mybench/. To
use it you'll need DBI,
DBD::mysql, and Time::HiRes
installed. The MyBench.pm module contains the
common logic. Creating a simple benchmark is a matter of adding your
logic to the supplied bench_example script.As we did with super-smack, let's look through the
bench_example script to understand how it works. The
first few lines simply import the required modules and set up some
simple command-line option handling. It requires two command-line
arguments. The -n argument specifies the number of
clients to simulate (children to fork), and -r
sets the number of iterations each client will run. The optional
-h argument can specify a hostname.
#!/usr/bin/perl -w
use strict;
use MyBench;
use Getopt::Std;
use Time::HiRes qw(gettimeofday tv_interval);
use DBI;
my %opt;
Getopt::Std::getopt('n:r:h:', \%opt);
my $num_kids = $opt{n} || 10;
my $num_runs = $opt{r} || 100;
my $db = "test";
my $user = "test";
my $pass = ";
my $port = 3306;
my $host = $opt{h} || "192.168.0.1";
my $dsn = "DBI:mysql:$db:$host;port=$port";
Of course, you can hardcode the values if you'd
like, or you can make the script more generic by parameterizing the
connection information (db,
user, pass,
port, host).With the setup out of the way, the script sets up a callback
function. It will be called by the code to set up an initial
connection and run the tests.
my $callback = sub
{
my $id = shift;
my $dbh = DBI->connect($dsn, $user, $pass, { RaiseError => 1 });
my $sth = $dbh->prepare("SELECT * FROM mytable WHERE ID = ?");
my $cnt = 0;
my @times = ( );
## wait for the parent to HUP me
local $SIG{HUP} = sub { };
sleep 600;
while ($cnt < $num_runs)
{
my $v = int(rand(100_000));
## time the query
my $t0 = [gettimeofday];
$sth->execute($v);
my $t1 = tv_interval($t0, [gettimeofday]);
push @times, $t1;
$sth->finish( );
$cnt++;
}
## cleanup
$dbh->disconnect( );
my @r = ($id, scalar(@times), min(@times), max(@times), avg(@times),
tot(@times));
return @r;
};
The callback first establishes a connection to the server and
prepares the query that will be executed. Next, it sets a few
variables and then sets a dummy signal handler. It then sleeps,
waiting for a SIGHUP. After the parent has started all the children,
it signals them to start using SIGHUP.After the signal has been handled, the main loop starts. In each
iteration, it selects a random value to test, starts a timer,
executes the query, and stops the timer. The resulting time is pushed
to the @times list for later use. We finish the
statement to dispose of any returned data and increment the loop
counter.After the loop completes, we disconnect from the server and return
the time information back to the caller:
my @results = MyBench::fork_and_work($num_kids, $callback);
MyBench::compute_results('test', @results);
exit;
_ _END_ _
The fork_and_work(
) subroutine from the MyBench
package is what gets everything rolling. The results are then passed
to compute_results( ) and printed. The first
argument passed is simply a name that will appear in the output to
identify the results.Here's a simple run, using a SELECT
1 query with 10 clients for 100,000 iterations:
$ ./bench_select_1 -n 10 -r 100000
forking: ++++++++++
sleeping for 2 seconds while kids get ready
waiting: ----------
test: 1000000 7.5e-05 0.65045 0.000561082981999975 561.082981999975 17822.6756483597
clients : 10
queries : 1000000
fastest : 7.5e-05
slowest : 0.65045
average : 0.000561082981999975
serial : 561.082981999975
q/sec : 17822.6756483597
The first three lines are merely status updates so you can tell that
the test is doing something while it runs. The
test: line produces all the statistics on a single
line, suitable for processing in another script or pasting into a
spreadsheet. They're followed by human readable
output.There you can see how many clients were used, the total number of
queries executed, and the response times (in seconds) of fastest and
slowest queries as well as the average. The serial
value explains approximately how many seconds the queries would have
taken if executed serially. Finally, the q/sec
number tells us how many queries per second (on average) the server
handled during the test.Because the code times only the query and not the work done by the
Perl script, you can add arbitrarily complex logic to the main loop.
Rather than generate a random number, maybe you need to read a value
from a file or from another database table. Perhaps you need to run a
few special queries every 785th iteration, to simulate the behavior
of your real application. Doing so with MyBench would be easy; using
super-smack would be more of a challenge.