High Performance MySQL [Electronic resources] نسخه متنی

2.3 Transactions

You can't examine
the more advanced features of a database system for very long before
transactions enter the mix. A
transaction is
a group of SQL queries that are treated
atomically, as a single unit of work. Either
the entire group of queries is applied to a database, or none of them
are. Little of this section is specific to MySQL. If
you're already familiar with ACID transactions, feel
free to skip ahead to the section "Transactions in
MySQL."

A banking application is the classic example of why transactions are
necessary. Imagine a bank's database with a two
tables: checking and savings. To move $200 from
Jane's checking account to her savings account, you
need to perform at least three steps:

Make sure her checking account balance is greater than $200.

Subtract $200 from her checking account balance.

Add $200 to her savings account balance.

The entire operation should be wrapped in a transaction so that if
any one of the steps fails, they can all be rolled back.

A transaction is initiated (or opened) with the
BEGIN statement and applied with
COMMIT or rolled back
(undone) with ROLLBACK. So the SQL for the transaction might
look like this:

         BEGIN;
[step 1] SELECT balance FROM checking WHERE customer_id = 10233276;
[step 2] UPDATE checking SET balance = balance - 200.00 WHERE customer_id = 10233276;
[step 3] UPDATE savings  SET balance = balance + 200.00 WHERE customer_id = 10233276;
COMMIT;

But transactions alone aren't the whole story. What
happens if the database server crashes while performing step 3? Who
knows? The customer probably just lost $200. What if another process
comes along between Steps 2 and 3 and removes the entire checking
account balance? The bank has given the customer a $200 credit
without even knowing it.

Simply having transactions isn't sufficient unless
the database server passes what is known as the ACID
test. ACID is an acronym
for Atomicity, Consistency, Isolation, and Durabilityfour
tightly related criteria that are required in a well-behaved
transaction processing system. Transactions that meet those four
criteria are often referred to as ACID
transactions.

Atomicity

Transactions must function as a single indivisible unit of work. The
entire transaction is either applied or rolled back. When
transactions are atomic, there is no such thing as a partially
completed transaction: it's all or nothing.

Consistency

The database should always move from one consistent state to the
next. Consistency ensures that a crash between Steps 2 and 3
doesn't result in $200 missing from the checking
account. Because the transaction is never committed, none of the
transaction's changes are ever reflected in the
database.

Isolation

The results of a transaction are usually invisible to other
transactions until the transaction is complete. This ensures that if
a bank account summary runs after Step 2, but before Step 3, it still
sees the $200 in the checking account. When we discuss isolation
levels, you'll understand why we said
usually invisible.

Durability

Once committed, the results of a transaction are permanent. This
means that the changes must be recorded in such a way that system
crashes won't lose the data. Of course, if the
database server's disks fail, all bets are off.
That's a hardware problem. We'll
talk more about how you can minimize the effects of hardware failures
in Chapter 6.

2.3.1 Benefits and Drawbacks

ACID transactions ensure
that banks don't lose your money. By wrapping
arbitrarily complex logic into single units of work, the database
server takes some of the burden off application developers. The
database server's ACID properties offer guarantees
that reduce the need for code guarding against race conditions and
handling crash recovery.

The downside of this extra security is that the database server has
to do more work. It also means that a database server with ACID
transactions will generally require more CPU power, memory, and disk
space than one without them. As mentioned earlier, this is where
MySQL's modularity comes into play. Because you can
decide on a per-table basis if you need ACID transactions or not, you
don't need to pay the performance penalty on a table
that really won't benefit from transactions.

2.3.2 Isolation Levels

The previous description of isolation
was a bit simplistic. Isolation is more complex than it might first
appear because of some peculiar cases that can occur. The SQL
standard defines four isolation levels with specific rules for which
changes are and aren't visible inside and outside a
transaction. Let's look at each isolation level and
the type of problems that can occur.

2.3.2.1 Read uncommitted

In the read
uncommitted isolation level, transactions can view
the results of uncommitted transactions. At this level, many problems
can occur unless you really, really know what you are doing and have
a good reason for doing it. Read uncommitted is rarely used in
practice. Reading uncommitted data is also known as a
dirty read.

2.3.2.2 Read committed

The default isolation level for most database systems is
read committed. It satisfies the
simple definition of isolation used earlier. A transaction will see
the results only of transactions that were already committed when it
began, and its changes won't be visible to others
until it's committed.

However, there are problems that can occur using that definition. To
visualize the problems, refer to the sample data for the
Stock and StockPrice tables as
shown in Table 2-2 and Table 2-3.

Table 2-2. The Stock table
id	Ticker	Name
1	MSFT	Microsoft
2	EBAY	eBay
3	YHOO	Yahoo!
4	AMZN	Amazon

Table 2-3. The StockPrice table
stock_id	date	open	high	low	close
1	2002-05-01	21.25	22.30	20.18	21.30
2	2002-05-01	10.01	10.20	10.01	10.18
3	2002-05-01	18.23	19.12	18.10	19.00
4	2002-05-01	45.55	46.99	44.87	45.71
1	2002-05-02	21.30	21.45	20.02	20.21
2	2002-05-02	10.18	10.55	10.10	10.35
3	2002-05-02	19.01	19.88	19.01	19.22
4	2002-05-02	45.69	45.69	44.03	44.30

Imagine you have a Perl script that runs nightly to fetch price data
about your favorite stocks. For each stock, it fetches the data and
adds a record to the StockPrice table with the
day's numbers. So to update the information for
Amazon.com, the transaction might look like this:

BEGIN;
SELECT @id := id FROM Stock WHERE ticker = 'AMZN';
INSERT INTO StockPrice VALUES (@id, '2002-05-03', 20.50, 21.10, 20.08, 21.02);
COMMIT;

But what if, between the select and insert, Amazon's
id changes from 4 to 17 and a new stock is added
with id 4? Or what if Amazon is removed entirely?
You'll end up inserting a record with the wrong
id in the first case. And in the second case,
you've inserted a record for which there is no
longer a corresponding row in the Stock table.
Neither of these is what you intended.

The problem is that you have a nonrepeatable
read in the query. That is, the data you read
in the SELECT becomes invalid by the time you
execute the INSERT. The repeatable read isolation
level exists to solve this problem.

2.3.2.3 Repeatable read

At the repeatable
read isolation level, any rows that are read
during a transaction are locked so that they can't
be changed until the transaction finishes. This provides the perfect
solution to the problem mentioned in the previous section, in which
Amazon's id can change or vanish
entirely. However, this isolation level still leaves the door open to
another tricky problem: phantom reads.

Using the same data, imagine that you have a script that performs
some analysis based on the data in the StockPrice
table. And let's assume it does this while the
nightly update is also running.

The analysis script does something like this:

BEGIN;
SELECT * FROM StockPrice WHERE close BETWEEN 10 and 20;
// think for a bit
SELECT * FROM StockPrice WHERE close BETWEEN 10 and 20;
COMMIT;

But the nightly update script inserts between those two queries new
rows that happen to match the close BETWEEN 10 and 20 condition. The second query will find more rows that the
first one! These additional rows are known as phantom
rows (or simply phantoms). They
weren't locked the first time because they
didn't exist when the query ran.

Having said all that, we need to point out that this is a bit more
academic than you might think. Phantom rows are such a common problem
that InnoDB's locking (known as next-key
locking) prevents this from happening. Rather
than locking only the rows you've touched in a
query, InnoDB actually locks the slot following them in the index
structure as well.

2.3.2.4 Serializable

The highest level of isolation,
serializable, solves the phantom read
problem by ordering transactions so that they can't
conflict. At this level, a lot of timeouts and lock contention may
occur, but the needs of your application may bring you to accept the
decreased performance in favor of the data stability that results.

Table 2-4 summarizes the various isolation levels
and the drawbacks associated with each one. Keep in mind that as you
move down the list, you're sacrificing concurrency
and performance for increased safety.

Table 2-4. ANSI SQL isolation levels
Isolation level	Dirty reads possible	Non-repeatable reads possible	Phantom reads possible
Read uncommitted	Yes	Yes	Yes
Read committed	No	Yes	Yes
Repeatable read	No	No	Yes
Serializable	No	No	No

2.3.3 Deadlocks

Whenever
multiple transactions obtain locks, there is the danger of
encountering a deadlock condition. Deadlocks occur when two
transactions attempt to obtain conflicting locks in a different
order.

For example, consider these two transactions running against the
StockPrice table:

Transaction #1:

BEGIN;
UPDATE StockPrice SET close = 45.50 WHERE stock_id = 4 and date = '2002-05-01';
UPDATE StockPrice SET close = 19.80 WHERE stock_id = 3 and date = '2002-05-02';
COMMIT;

Transaction #2:

BEGIN;
UPDATE StockPrice SET high  = 20.12 WHERE stock_id = 3 and date = '2002-05-02';
UPDATE StockPrice SET high  = 47.20 WHERE stock_id = 4 and date = '2002-05-01';
COMMIT;

If you're unlucky, each transaction will execute its
first query and update a row of data, locking it in the process. Each
transaction will then attempt to update its second row only to find
that it is already locked. Left unchecked, the two transactions will
wait for each other to completeforever.

To combat this problem, database systems implement various forms of
deadlock detection and timeouts. The more sophisticated systems, such
as InnoDB, will notice circular dependencies like the previous
example and return an error. Others will give up after the query
exceeds a timeout while waiting for a lock. InnoDB's
default timeout is 50 seconds. In either case, applications that use
transactions need to be able to handle deadlocks and possibly retry
transactions.

2.3.4 Transaction Logging

Some of the overhead involved with
transactions can be mitigated through the use of a transaction log.
Rather than directly updating the tables on disk each time a change
occurs, the system can update the in-memory copy of the data (which
is very fast) and write a record of the change to a
transaction log on disk. Then, at some later time, a
process (or thread) can actually apply the changes that the
transaction log recorded. The serial disk I/O required to append
events to the log is much faster than the random seeks required to
update data in various places on disk.

As long as events are written to the transaction log before a
transaction is considered committed, having the changes in a log will
not affect the durability of the system. If the database server
crashes before all changes have been applied from the transaction
log, the database will continue applying changes from the transaction
log when it is restarted and before it accepts new connections.

2.3.5 Transactions in MySQL

MySQL provides two
transaction-safe storage engines: Berkeley DB (BDB) and
InnoDB. Their specific properties are discussed in next section. Each
one offers the basic
BEGIN/COMMIT/ROLLBACK
functionality. They differ in their supported isolation levels,
locking characteristics, deadlock detection, and other features.

2.3.5.1 AUTOCOMMIT

By default MySQL operates in
AUTOCOMMIT mode. This means that unless
you've explicitly begun a transaction, it
automatically executes each query in a separate transaction. You can
enable AUTOCOMMIT for the current connection by
running:

SET AUTOCOMMIT = 1;

Disable it by executing:

SET AUTOCOMMIT = 0;

Changing the value of AUTOCOMMIT has no effect on
non-transaction-safe tables such as MyISAM or HEAP.

2.3.5.2 Implicit commits

Certain commands, when issued during an
open transaction, cause MySQL to commit the transaction before they
execute. Typically these are commands that make significant changes,
such as removing or renaming a table.

Here is the list of commands for which MySQL implicitly commits a
transaction:

ALTER TABLE

BEGIN

CREATE INDEX

DROP DATABASE

DROP TABLE

RENAME TABLE

TRUNCATE

LOCK TABLES

UNLOCK TABLES

As additional features are added to MySQL, it is possible that other
commands will be added to the list, so be sure to check the latest
available documentation.

2.3.5.3 Isolation levels

MySQL allows you to set the
isolation level using the SET TRANSACTION ISOLATION LEVEL command.
Unless otherwise specified, the isolation level is changed beginning
with the next transaction.

To set the level for the whole session (connection), use:

SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED

Here's how to set the global level:

SET GLOBAL TRANSACTION ISOLATION LEVEL SERIALIZABLE

MySQL recognizes all four
ANSI standard isolation levels, and as of Version 4.0.5 of MySQL,
InnoDB supports all of them:

READ UNCOMMITTED

READ COMMITTED

REPEATABLE READ

SERIALIZABLE

The default isolation level can also be set using the
--transaction-isolation command-line option when
starting the server or set via my.cnf.

2.3.5.4 Mixing storage engines in transactions

Transaction management in MySQL
is currently handled by the underlying storage engines, not at a
higher level. Thus, you can't reliably mix tables
stored in transactional engines (such as InnoDB and BDB) in a single
transaction. A higher-level transaction management service may
someday be added to MySQL, making it safe to mix and match
transaction-safe tables in a transaction. Until then,
don't expect it to work.

If you mix transaction-safe and non-transaction-safe tables (such as
InnoDB and MyISAM) in a transaction, the transaction will work
properly if all goes well. However, if a rollback is required, the
changes to the non-transaction-safe table won't be
undone. This leaves the database in an inconsistent state that may be
difficult to recover from (and renders the entire point of
transactions moot).

2.3.5.5 Simulating transactions

At times you may
need the behavior of transactions when you aren't
using a transaction-safe table. You can achieve something like
transactions using MySQL's
LOCK TABLES and
UNLOCK TABLES commands. If you
lock the tables that will be involved in the transaction and keep
track of any changes that you make (in case you need to simulate a
rollback), you'll have something equivalent to
running at the serializable isolation level. But the process is
kludgy and error prone, so if you really need transactions, we
recommend using a transactional storage engine.