High Performance MySQL [Electronic resources] نسخه متنی

2.4 Selecting the Right Engine

When
designing MySQL-based applications, you should decide which engine to
use for storing your data. If you don't think about
it during the design phase, you will likely face complications later
in the process. You might find that the default engine
doesn't provide a feature you need, such as
transactions. Or maybe the mix of read and write queries your
application generates will require more granular locking than
MyISAM's table locks.

Because you can make the choice on a table-by-table basis,
you'll need a clear idea of how each table is used
and the data it stores. Of course, it also helps to have a good
understanding of the application as a whole and its potential for
growth. Armed with this information, you can begin to make good
choices about which table engines can do the job.

2.4.1 Considerations

While there are many factors that can affect your decision, it
usually boils down to just a few considerations: transactions and
concurrency, backups, and special features.

2.4.1.1 Transactions and concurrency

When it comes
to transactions and concurrency, consider the following guidelines:

If your application requires transactions and high read/write
concurrency, InnoDB is probably your best bet.

If your application requires transactions but only moderate
read/write concurrency, either BDB or InnoDB tables should work fine.

If your application doesn't require transactions and
issues primarily SELECT or primarily
INSERT/UPDATE queries, MyISAM is a good choice.
Many web applications fall into this category.

2.4.1.2 Backups

The need to perform regular
backups may also influence your table choices. If your server can be
shut down at regular intervals for backups, the storage engines are
equally easy to deal with. However, if you need to perform online
backups in one form or another, the choices become less clear. Chapter 9 deals with this topic in more detail.

Another way of looking at this is simplicity. As
you'll see in Chapter 9, using
multiple storage engines increases the complexity of backups and
server tuning. You may decide that it's just easier
to use a single storage engine rather than those that are
theoretically best.

2.4.1.3 Special features

Finally, you sometimes find
that an application relies on particular features or optimizations
that are provided by only some of MySQL's storage
engines. For example, not all tables provide a quick answer to
queries like the following:

SELECT COUNT(*) FROM mytable

If your application depends on accurate and fast row counts,
MyISAM is the answer. InnoDB
must actually count up all the rows, but the MyISAM storage engine
always knows the exact row count of a table without the need to do
any work.

If your application requires referential integrity with foreign keys,
you're limited to just InnoDB tables. Do you need
full-text search capabilities? Only MyISAM tables provide it.

Keep this in mind as you read the more detailed information about
each table's features. There will come a time when
you find that the feature you really, really need is available only
in one table engine. When that happens, you need to either compromise
or break a table into multiple tables of different types.

2.4.2 Practical Examples

These issues may seem rather abstract
without some sort of real-world context. So let's
consider some common uses for tables in various database
applications. For each table, we'll look at which
engine best matches with the table's needs. The
details of each engine are covered in the next section.

2.4.2.1 Logging

Suppose
you want to use MySQL to log a record of every telephone call from a
central telephone switch in real time. Or maybe
you've installed mod_log_sql
for Apache so you can log all visits to your web site directly in a
table. In such an application, speed is probably the most important
goal; you don't want the database to be the
bottleneck. Using MyISAM tables works very well because they have
very low overhead and can handle inserting thousands of records per
second.

Things will get interesting if you decide it's time
to start running reports to summarize the data
you've logged. Depending on the queries you use,
there's a good chance you will significantly slow
the process of inserting records while gathering data for the report.
What can you do?

You can use MySQL's built-in replication (Chapter 7) to clone the data onto a second (slave)
server. You can then run your time- and CPU-intensive queries against
the data on the slave. This keeps the master free to insert records
as fast as it possibly can while also giving you the freedom to run
any query you want without worrying about how it could affect the
real-time logging.

Another option is to use a MyISAM Merge
table. Rather than always logging to the same table, adjust the
application to log to a table that contains the name or number of the
month in its name, such as web_logs_2004_01 or
web_logs_2004_jan. Then define a Merge table that
contains the data you'd like to summarize and use it
in your queries. If you need to summarize data daily or weekly, the
same strategy works; you just need to create tables with more
specific names, such as web_logs_2004_01_01. While
you're busy running queries against tables that are
no longer being written to, your application can log records to its
current table uninterrupted. Merge tables are discussed in the later
section "MyISAM Merge tables."

A final possibility is simply to switch to using a table that has
more granular locking than MyISAM does. Either BDB or InnoDB works
well in this case. Non-MyISAM tables will generally use more CPU and
disk space, but that may be a reasonable tradeoff in this case. Also,
in the event of a crash, MyISAM tables may take quite a long time to
check and repair while InnoDB tables should recover quickly.

2.4.2.2 Read-only or read-mostly tables

Tables that contain
the data used to construct a catalog or listing of some sort (jobs,
auctions, real estate, etc.) are usually read from far more often
than they are written to. This makes them great candidates for
MyISAM.

2.4.2.3 Order processing

When you deal with any sort of order
processing, transactions are a requirement. Half-completed orders
aren't going to endear customers to your service.
Using transaction-safe table types (InnoDB or BDB), these unfortunate
"data surprises" can be avoided.
Considering that BDB tables useat bestlocking at the
page level, applications with high transaction volumes should
consider InnoDB tables.

In the case of order processing, InnoDB has a distinct advantage
because it supports referential integrity through the use of foreign
keys. These keys allow a field in one table to have an enforced
relationship to the key of another table (e.g., an
Order record contains a
CustomerID field that
"points" to the primary key of the
Customer table). Foreign keys effectively point to
those other tables and indicate that data is maintained in them, and
they help you keep data consistent across your tables. (Keep in mind
that a foreign key in an InnoDB table must reference another InnoDB
table. Currently they can't cross storage engines.)

You might want to design your tables so that customers
can't be removed without also removing all their
orders. Or maybe you'd like to ensure that products
aren't deleted from the catalog table before the
orders that reference those products are archived. With
InnoDB's foreign keys, you can.

2.4.2.4 Stock quotes

If you're collecting
stock quotes for your own analysis, MyISAM tables work great.
However, if you're running a high-traffic web
service that has a real-time quote feed and thousands of users, a
query should never have to wait. At any time, there could be many
clients attempting to read and write to the table, so the row-level
locking provided by InnoDB tables is the way to go.

If you have sufficient memory, MySQL's in-memory
Heap tables might be an option, too. However, their indexes have some
interesting restrictions you need to investigate first. See
Section 4.3.2 in Chapter 4 for more details.

2.4.2.5 Bulletin boards and threaded discussion forums

Threaded discussions are an interesting
problem for MySQL users. There are hundreds of freely available PHP
and Perl-based systems available that provide threaded discussions.
Many of them aren't written with database efficiency
in mind, so they tend to perform a large number of queries for each
request they serve, as well as updates to counters and usage
statistics about the various discussions. Many of the systems also
use a small number of monolithic tables to store all their data. As a
result, a few central tables become the focus of heavy read and write
activity, and the locks required to enforce concurrency become a
substantial source of contention.

Despite their design shortcomings, most of the systems work well for
small and medium loads. However, if a web site grows large enough and
generates a significant amount of traffic, it may begin to get very
slow. The obvious solution is to switch to a different table type
that can handle the heavy read/write volume. Users who have attempted
this are sometimes surprised to find that the system runs even more
slowly than it did before!

What they don't realize is that the system is using
a particular query, normally something like:

SELECT COUNT(*) FROM table WHERE ...

The problem is that not all engines can run that query quickly.
MyISAM tables keep accurate row counts available, so they can. But
BDB and InnoDB must actually scan the data to count all the rows. The
developers of the popular web site Slashdot (http://slashdot.org/) ran into this problem
when they moved their system from MyISAM to InnoDB tables. They spent
time going through their code to eliminate all those queries.

MySQL's query cache, which we'll
cover in more detail in Chapter 5, can often be
a big help in situations in which an application issues the same
query over and over with the same parameters.

2.4.2.6 CD-ROM applications

If you ever need to distribute a
CD-ROM- or DVD-ROM-based application that uses MySQL data files,
consider using MyISAM or Compressed MyISAM tables. They can be easily
isolated and copied to other media. Compressed MyISAM tables take far
less space than uncompressed ones, but they are read-only. Since the
data is going to be on read-only media anyway,
there's little reason not to use compressed tables.

2.4.3 Table Conversions

Several techniques are available
to convert one table type to another, each with advantages and
disadvantages. In the following sections, we cover three of the most
common.

2.4.3.1 ALTER TABLE

The easiest way to move a table from one engine to another is by
using an
ALTER TABLE statement.
The following command converts mytable to BDB:

ALTER TABLE mytable TYPE = BDB;

As of MySQL Versions 4.0.18 and 4.1.2, you may use
ENGINE instead of TYPE. In a
later version of MySQL (probably in the 5.x series), support for
TYPE will be removed entirely.

The previous syntax works for all storage engines, but
there's a catch: it can take a lot of time. MySQL
will perform a row-by-row copy of your old table into your new table.
During that time, you'll probably be using all the
server's disk I/O capacity, and the original table
will be locked while the conversion runs. So take care before trying
this technique on a busy table. Instead, you can use one of the
following methods, which involve making a copy of the table first.

2.4.3.2 Dump and reimport

To gain more control over the process,
you might choose to dump the table to a text file using the
mysqldump
utility. Once the table is dumped, simply edit the dump file to
adjust the CREATE TABLE
statement it contains. Be sure to change the table name as well as
its type because you can't have two tables with the
same name in the same database even if they are of different types.

If you import into InnoDB or BDB, be sure to use the
--no-autocommit option to disable
AUTOCOMMIT mode. Otherwise each individual insert
will be performed in its own transaction.

The downside of using mysqldump is that it
isn't terribly fast and uses far more disk space.
Not only will the dump file contain all the data from the table, it
will also contain all the SQL necessary to repopulate the table.
Also, you won't be able to delete the dump file
until the new table has been created.

Furthermore, if the dump file happens to be quite large, editing it
can be a challenge. You can't simply load a 6-GB
file into vi or emacs on
most systems.^[2] Instead,
you'll need to craft a Perl or
sed script to do the job.

^[2] Maybe you can, but
it'll be pretty painful.

2.4.3.3 CREATE and SELECT

The third technique is a compromise
between the speed of the first mechanism and the safety of the
second. Rather than dumping the entire table or converting it all at
once, you create the new table and use MySQL's
INSERT INTO
... SELECT syntax to populate
it incrementally. If, for example, you have a MyISAM table called
myisam_table that you'd like to
convert to an InnoDB table named innodb_table, you
need to run queries like this:

BEGIN;
INSERT INTO innodb_table SELECT * FROM myisam_table WHERE id BETWEEN x AND y;
COMMIT;

Assuming that id is the primary key, you run that
query using larger values of x and
y each time until all the data has been copied to
the new table. After doing so, you are left with the original table,
which you can drop after you're done with it, and
the new table, which is now fully populated.

Alternatively, if you use MySQL 4.1 or newer, you can create the new
table and copy the table in two steps:

CREATE TABLE newtable LIKE mytable;
INSERT INTO newtable SELECT * FROM mytable;

Whichever method you use, if you're dealing with a
large volume of data, it's often more efficient to
copy the data before adding indexes to the new table.